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Abstract 

We present a retrieval method based on Bayesian analysis to infer the atmospheric compositions and 
surface or cloud-top pressures from transmission spectra of exoplanets with general compositions. In 
this study, we identify what can unambiguously be determined about the atmospheres of exoplanets 
from their transmission spectra by applying the retrieval method to synthetic observations of the 
super-Earth GJ 1214b. Our approach to infer constraints on atmospheric parameters is to compute 
their joint and marginal posterior probability distributions using the Markov Chain Monte Carlo 
technique in a parallel tempering scheme. A new atmospheric parameterization is introduced that is 
applicable to general atmospheres in which the main constituent is not known a priori and clouds may 
be present. 

Our main finding is that a unique constraint of the mixing ratios of the absorbers and two spectrally- 
inactive gases (such as N2 and primordial H2-I-He) is possible if the observations are sufficient to 
quantify both (1) the broadband transit depths in at least one absorption feature for each absorber 
and (2) the slope and strength of the molecular Rayleigh scattering signature. A second finding is that 
the surface pressure or cloud-top pressure can be quantified if a surface or cloud deck is present at low 
optical depth. A third finding is that the mean molecular mass can be constrained by measuring either 
the Rayleigh scattering slope or the shapes of the absorption features, thus enabling one to distinguish 
between cloudy hydrogen-rich atmospheres and high mean molecular mass atmospheres. We conclude, 
however, that without the signature of molecular Rayleigh scattering — even with robustly detected 
infrared absorption features (>10(t) — there is no reliable way to tell from the transmission spectrum 
whether the absorber is a main constituent of the atmosphere or just a minor species with a mixing ratio 
of A'abs<0.1%. The retrieval method leads us to a conceptual picture of which details in transmission 
spectra are essential for unique characterizations of well-mixed exoplanct atmospheres. 

Subject headings: methods: numerical - planets and satellites: atmospheres - planetary systems: indi- 
vidual (GJ 1214b) 



1. INTRODUCTION 

Major advances in the detection and characteriza- 
tion of exoplanet atmospheres have been made over the 
last decade. To date, several dozen hot Jupiter atmo- 
spheres have been observed by the Spitzer Space Tele- 
scope, Hubble Space Telescope and/or ground-based ob- 
servations. Observational highlight s include the detec- 
tion of molecules and at o ms (e.g.. ICharbonneau et al.l 
[2OOI IDeming et all [2005t ISeagerfc De ming' '201 01) and 



the i dentification of thermal inversion (iKnutson et al 
200g ). Recent observational efforts (e.g ., IBean et al 



20101: iCroU et al.l 120111: iBerta et al.l 120121 ) suggest that 



the continuous improvements in observational techniques 
will enable us to extend the field of atmospheric charac- 
terization to the regime of super-Earths (exoplanets with 
mass between 1 and 10 M^) in the near future. 

Since super-Earth exoplanets lie in the intermediate 
mass range between terrestrial planets and the gas/ice 
giants in the Solar System, compelling questions arise 
as to the nature and formation histories of these objects 
and whether they are capable of harboring life. A po- 
tential way of answering these questions is to constrain 
the molecular compositions and thicknesses of their at- 
mospheres from spectral observations of the tran smission 
and/or emission spectra (|Miller-Ricci et aLll2009() . While 
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a thick hydrogen/helium envelope would indicate that 
their formation histories are similar to those of the gas 
or ice giant planets in the Solar System, super-Earths 
that are predominately solid planets may be scaled-up 
analogs of the terrestrial planets in our solar system. Al- 
ternative scenarios of planets different in nature to the 
solar system planets, such as planets mainly composed 
of water or carbon compounds, have been proposed as 
well (lKuchnedl2003l:lLgger et al.ll200llKuchner fc Seageii 
I2OO5I) . 

As the first observations of the transmission spec- 
trum of the super-Earth GJ 1214b become available, 
the current practice in interpreting these spectra is 
to check the observations for their agr eement to pre- 
conceived atmospheric scenarios ( Miller- Ricci fc FortnevI 
[20T0t IBean et al.|[20lol : iCroU et al.ii2011i) . There are two 
dangers with this approach: First, even if a good fit is 
reached between the data and the model spectrum of a 
preconceived scenario, we do not know whether we actu- 
ally understand the nature of the planet or whether we 
have simply found one out of several possible scenarios 
matching the data. Second, and even more important, 
we will not be able to understand planets that do not fit 
our preconceived ideas. These planets, however, would 
likely represent the most compelling science cases as they 
may provide new insights into planetary formation and 
evolution, atmospheric chemistry, or astrobiology. 



Here, we present a new tool for the interpretation of 
transmission spectra of transiting super-Earth and mini- 
Neptune exoplanets. The approach is fundamentahy dif- 
ferent from previous work on super-Earth atmospheres 
in that we retrieve constraints on the atmospheric com- 
position by exploring a wide range of atmospheres with 
self-consistent temperature structures. Our approach 
builds on the idea introdu ced in the pione e ring w orks 
on hot Jupiters by Madhu sudhan fc SeageJ ()2009() and 
iMadhusudhan et al.l (|2011ai) to use Monte Carlo meth- 
ods to explore the parameter space for solutions that are 
in agreement with the observations. The method pre- 
sented here is different in three ways. First, our retrieval 
method is applicable to atmospheres of general composi- 
tion and considers the presence of a cloud deck or solid 
surface. Wc introduce concepts of compositional data 
analysis, a subfield of statistical analysis, to treat the 
mixing ratios of all molecular constituents equally while 
ensuring that the sum of the mixing ratios is unity. Sec- 
ond, we use a radiative-convective model to calculate a 
temperature profile that is self-consistent with the molec- 
ular composition of each model atmosphere. Third, wc 
conduct a full Bayesian analysis and infer our constraints 
on atmospheric parameters directly by marginalizing the 
joint posterior probability distribution obtained from the 
Markov Chain Monte Carlo (MCMC) simulation. We 
therefore obtain the most-likely estimates and statisti- 
cally sign ificant Bayesian cred i ble int ervals for each pa- 
rameter. IMadhusudhan et al.l ()2011bl ) used the MCMC 
algorithm to explore the model parameter space in the 
search for regions that provide good fits to the data. 
Based on the parameter exploration, they were able to 
report contours of constant goodness-of-fit in the param- 
eters space. Contours of constant goodness-of-fit, how- 
ever, cannot directly be related to the confidence regions 
of the desired parameters. 

The retrieval method presented in this work is dif- 
fer ent from optimu m estimation retrieval, as described 
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RodgersI (I2000D and recently ap p lied t o exoplanets 
Lee et al.l72011l ) and iLine et al.l (|2012| ). in that wc 



derive the full probability distributions and Bayesian 
credible regions for the desired atmospheric parame- 
ters, while optimum estimation retrieval assumes Gaus- 
sian errors around a single best-fitting solution. Highly 
non-Gaussian uncertainties of the atmospheric param- 
eters need to be considered for noisy exoplanet obser- 
vations because the observable atmospheric spectra are 
highly nonlinear functions of the desired atmospheric pa- 
rameters, and a large volume in the parameter space 
are generally compatible with noisy exoplanet spectra. 
Our approach to calculate the joint posterior probability 
distribution using MCMC is computationally intensive 
(~ 10^ model evaluations are required), but it enables 
us to extract all that can be inferred about the atmo- 
spheric parameters from the observational data. The 
uncertainty of individual atmospheric parameters intro- 
duced by complex, non-Gaussian correlations to other 
parameters is accounted for in a straightforward way 
by marginalizing over the remaining parameters. Opti- 
mum estimation retrieval, in contrast, searches for a sin- 
gle best-fitting solution using the Levenberg-Marquardt 
algorithm. Gaussian uncertainties are est imated around 
the bes t-fitting solution by linear analysis ()Rodgersll200Cll : 
iLee et al.,,2011. ) or by performing multiple retrieval runs 



with individual parameters fixed at particular values 
(|Lee et al.ll201lD . Optimum estimation retrieval requires 
fewer model evaluations (typically ~ 10 — 20 per retrieval 
run, multiplied by the number of retrieval runs performed 
with individual parameters fixed) and may therefore al- 
low the use of more complex atmospheric models given 
the same computational resources. For noisy exoplanet 
spectra, however, the optimal estimation retrieval may 
not correctly represent the confidence regions of the at- 
mospheric parameters because the uncertainties of the 
atmospheric parameters are highly non-Gaussian. 

In this work, we investigate what we can learn about 
the atmospheres of super-Earths solely based on trans- 
mission spectroscopy by applying the retrieval method 
to a sample of synthetic observations of different super- 
Earth scenarios. Previous studies have shown that the 
atmospheric composition as well as the presence of a 
solid surface and clouds affect the planetary spectruni 
(e.g.. iSeager fc Sasselo^ 120001: iDes Marais et all 120021: 
lEhrenreich et al.ll2006[ ). but no comprehensive study of 
the degeneracy of these effects has been performed. As 
a result, it is not fully understood which individual at- 
mospheric parameters can be inferred uniquely from the 
spectrum and which parameters are strongly correlated 
or degenerate. In particular, for super-Earth atmo- 
spheres for which the formation history and subsequent 
evolution is not understood, we do not know the mean 
molecular mass and the thickness of the atmosphere a 
priori. For such planets, the depths of individual absorp- 
tion features in the spectrum are affected not only by 
the mixing ratio of the absorbing molecular species, but 
also by the unknown mean molecular mass of the back- 
ground atmosphere and the surface or cloud deck pres- 
sure. Strong correlations or degeneracies between these 
atmospheric properties are therefore expected, but have 
not been addressed sufficiently in the literature. 

The paper outline is as follows. We introduce the new 
retrieval method in Section 2. Section 3 describes the 
synthetic observations of the super-Earth scenarios. In 
Section 4, we introduce a conceptual picture of the infor- 
mation contained in transmission spectra and present nu- 
merical results. Section 5 discusses the overall approach 
to obtain atmospheric constraints from observations and 
expands on the effect of hazes and stratified atmospheres. 
We also discuss a new way of planning observations us- 
ing our atmospheric retrieval method, and comment on 
the complementarity between atmospheric retrieval and 
self-consistent modeling of atmospheres. In Section 6, we 
present a summary of our results and the conclusions. 

2. METHOD 

Our eventual aim is to characterize the atmospheres 
of exoplanets based on observations of their transmission 
spectra and without prior knowledge of their natures. 
The primary inputs to the retrieval method are observa- 
tions of the wavelength-dependent transit depth during 
the primary transit. The outputs are the best estimates 
and confidence regions of the desired atmospheric prop- 
erties, such as the mixing ratios of the molecular con- 
stituents and the surface/cloud-top pressure. Wc solve 
the "inverse" problem to regular atmospheric modeling, 
in which the transmission spectrum is calculated given 
a description of the composition and state of the atmo- 
sphere. 



The essential part of defining tlie retrieval problem is 
to specify a set of parameters that both unambiguously 
defines the state of atmospheres and may be constrained 
by transit observations. We employ an atmospheric "for- 
ward" model to represent the physical relation between 
the set of atmospheric parameters and the observable 
transit depths. Given a set of observations, we retrieve 
constraints on atmospheric parameters by performing a 
Bayesian analysis using the atmospheric forward model 
and the MCMC technique. The joint posterior prob- 
ability distribution provided by the MCMC simulation 
represents the complete state of knowledge about the at- 
mospheric parameters in the light of the observational 
data. 

2.1. Atmosphere Parameterization 

We propose a parametric description of the atmosphere 
guided by the information available in exoplanet trans- 
mission spectra. Our approach is to treat the atmosphere 
near the terminator as a well-mixed, one-dimensional at- 
mosphere and describe the unknown molecular compo- 
sition, thickness, and albedo of this atmosphere by free 
parameters. The motivation for treating the atmosphere 
as well-mixed is to keep the number of parameters to a 
minimum to avoid overfitting of the sparse data avail- 
able in the near future, while ensuring that all atmo- 
spheric properties that considerably affect the retrieval 
from the spectrum are described by free parameters. 
For atmospheres with a stratified composition, the re- 
trieval method determines an altitude-averaged mixing 
ratio that best matches the observed transmission spec- 
trum f Section [53)1 . 

The unknown temperature profile at the terminator 
presents a challenge. While the pressure dependence of 
the temperature profile has only a secondary effect on 
the transmission spectrum and can likely not be retrieved 
given that the molecular composition is unknown a pri- 
ori, the temperature does affect the scale height and may 
affect the constraints on other parameters. Our approach 
is not to retrieve the temperature profile, but to account 
for the uncertainty introduced by the unknown temper- 
ature on the retrieved composition and surface pressure. 
We therefore introduce a free parameter for the planetary 
albedo and calculate the temperature profile consistent 
with the molecular composition and the planetary albedo 
for each model atmosphere. In the MCMC analysis, the 
albedo is then allowed to vary over the range of plausible 
planetary albedos. Marginalizing the posterior distribu- 
tion over all albedo values allows us to account for the 
uncertainty of the composition and surface pressure in- 
troduced by the unknown albedo. 

Our proposed model has the following free parameters. 

Volume mixing ratios of atmospheric constituents — We 
parameterize the composition of the atmosphere by the 
volume mixing ratios of all plausibly present molecular 
species. The volume mixing ratio Xi (or equivalently 
the mole fraction) is defined as the number density of 
the constituent ni divided by the total number density 
of all constituents in the gas mixture ritot- No assump- 
tions on the elemental composition, chemistry, or for- 
mation and evolution argument s are made. In contrast 
to the work on hot Jupiters by iMadhusudhan fc Seageii 
(|2009f ). we cannot assume a hydrogen-dominated atmo- 



sphere. We therefore reparameterize the mixing ratios 
with the centered-log-ratio transformations described in 
Scction [2.3.5l The transformation ensures that all molec- 
ular species are treated equally and no modification is 
required when applying the retrieval method to atmo- 
spheres with different main constituents. 

Surface or cloud deck pressure — Wc introduce the "sur- 
face" pressure Psurf as a free parameter, where the sur- 
face is either the ground or an opaque cloud deck. Solid 
surfaces and opaque cloud decks have the same effect 
on the transmission spectrum and we cannot discrimi- 
nate between them. Our parameterization of the surface 
is applicable for rocky planets with a thin atmosphere 
as well as planets with a thick gas envelope. For thick 
atmosphere, for which there is no surface affecting the 
transmission spectrum, the inference of the surface pres- 
sure parameter provides a lower bound on the thickness 
of the cloud-free part of the atmosphere. 

Planet-to-star radius ratio parameter — We define the 
planet-to-star radius ratio parameter, Rp^io/R*, as the 
planetary radius at the 10 mbar pressure level, i?p,io- 
divided by the radius of the star R^. Our approach to 
define the planetary radius at a fixed pressure level rather 
than at the surface avoids degeneracy between the plan- 
etary radius and the surface pressure for optically thick 
atmospheres for which the surface pressure cannot be 
constrained. It enables us to perform the retrieval for all 
types of planets without knowing a priori whether or not 
the planet has a surface. For planets with a surface pres- 
sure lower than 10 mbar, we still model an atmosphere 
down to the 10 mbar level and consider layers at pressure 
levels with P > Pgurf to be opaque. 

Planetary albedo — While the goal is not to infer the plan- 
etary albedo, Ap, we wish to account for the uncertainty 
in the retrieved mixing ratios and surface pressure in- 
troduced by the unknown planetary albedo and equilib- 
rium temperature. We therefore define the albedo as a 
free-floating parameter and assign a prior to the albedo 
parameter that reflects our ignorance of the albedo. 

Fixed input parameters — Additional input parameters 
that are fixed in this study are the radius of the star, i?*, 
the planetary mass known from radial velocity measure- 
ments, Mp, and the semi-major axis of the planet's orbit, 
Gp. The effect of the uncertainties associated with these 
parameters on the retrieval results may be accounted for 
by letting the parameters float and assigning them a prior 
distribution. 

2.2. Atmospheric "Forward" Model 

The objective of the atmospheric "forward" model is to 
generate transmission spectra for a wide range of differ- 
ent atmospheric compositions and thicknesses. Given a 
set of atmospheric parameters fSection l'i.ip . our model 
uses linc-by-line radiative transfer in local thermody- 
namic equilibrium, hydrostatic equilibrium, and a tem- 
perature profile consistent with the molecular composi- 
tion to determine the transmission spectrum. The output 
of each model run is a high-spectral resolution transmis- 
sion spectrum as well as simulated instrument outputs 
given the response functions of the instrument channels 
used in the observations. To obtain convergence of the 



posterior probability distribution in the MCMC infer- 
ence, the model must efficiently generate ~ fO^ atmo- 
spheric model spectra. 

2.2.1. Opacities 

Molecular absorption — We determine the monochro- 
matic molecular a bsorption cross sectio ns from the HI- 
TRAN database (jRothman et all |2009[) below 800 K. 
At temperatures higher than 800K we account for the 
high-temperature transitions of th e gases H2O, CO2, an d 
CO using the HITEMP database (jRothman et al.ll2010[ ). 
We account fo r H2-H2 c o llision -induced absorption using 
opacities from iBorvsowl ()2002[ ) . 

To speed up the evaluation of a large number of at- 
mospheric models, we first determine the wavelength- 
dependent molecular cross sections for each of the consid- 
ered molecular species on a temperature and log-pressure 
grid and then interpolate the cross section for the re- 
quired conditions. In the upper atmosphere, spectral 
lines become increasingly narrow, requiring a very high 
spectral resolution to exactly capture the shapes o f the 
thin Doppler-broadened lines (jGoodv fc Yundll995() . In- 
stead of ensuring that each line shape at low pressure 
is represented exactly, we choose an appropriate spectral 
resolution for the line-by-line simulation by ensuring that 
the simulated observations are not altered by more than 
1% of the observational error-bar when the spectral res- 
olution is doubled or quadrupled. While there are many 
methods proposed in the literature to reduce the compu- 
tation time (e.g . , corr elated-fc methods and band-models; 
iGoodv fc Yung! rT995| ) ■ the accuracy of such methods is 
hard to assess when the atmospheric composition is com- 
pletely unknown a priori. 

Rayleigh scattering — The Rayleigh scattering cross sec- 
tion, au^i, for a molecular species i can be expressed in 
cgs units as 
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where v is the wavenumber in cm^^, N is the num- 
ber density in , nn is the refractive index of the gas 
at the wavenumber v^ and F\;^i(v) is the King correc- 
tion factor. The scattering cross section of the gas 
mixture ojiiy) = ^^iOR^iiy) is the weighted av- 
erage from all major atmospheric constituents. The 
refractive indices and King correction factor func- 
tions of N2, CO , CO2, CH4, andN20 are taken from 
iSneep fc Ubachsl (I2005D. while the refrac tive index for 
II2O is taken from iSchiebener et al.l ()199ClD . 

Clouds — Our model accounts for the potential presence 
of an opaque cloud deck whose upper surface's altitude 
is described by a free retrieval parameter. We assume 
a wavelength-independent, sharp cutoff of grazing light 
beams at the upper end of the cloud deck. The assump- 
tion of a sharp cutoff reasonably captures the effects of 
typical condensation cloud layers because, at ultra-violet 
to near-infrared wavelengths, such cloud layers become 
opaque on length-scales that are small compared to the 
uncertainty in the radius measurements probed by the 
transit observation. The motivation behind modeling the 
clouds as a sharp cutoff of grazing light beams is to ob- 
tain a zeroth order model capturing the trends of clouds 



on the transmission spectrum while using only one free 
parameter for cloudsin the retrieval. 

2.2.2. Temperature-Pressure Profile 

We use the analyti cal desc r iption for irradiated plane- 
tary atmospheres by IGuillotI (J201C0 with convective ad- 
justments to approximate a temperature profile that is 
self-consistent with the atmospheric opacities and Bond 
albedo of each model atmosphere. The motivation be- 
hind this gray-atmosphere approach is that (1) its com- 
putational efficiency allows us to obtain temperature- 
pressure profiles consistent with the molecular composi- 
tion for a large number of model atmospheres and (2) the 
uncertainties in the atmospheric temperature are dom- 
inated by the uncertainties in the albedo rather than 
model errors. ^_^ 

The IGuillotI (j2010t) model describes the horizontally- 
averaged temperature profile T as a function of optical 
depth, T, by 
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where Teq is the planet's equilibrium temperature. Tint 
parameterizes the internal luminosity of the planet (set to 
in this work) , and 7 is the ratio of the mean visible and 
thermal opacity and therefore parameterizes the deposi- 
tion of stellar radiation in the atmosphere. We determine 
the mean opacities at visible and thermal wavelengths by 
averaging the line-by-line opacities weighted by the black 
body intensity at the effective star temperature and at 
the planet's equilibrium temperature, respectively. 

Given a composition of the model atmosphere and 
the planetary albedo, we iteratively determine a solution 
that is self-consistent with the molecular opacities and 
in agreement with radiative and hydrostatic equilibrium. 
In the process, we check for the onset of convective insta- 
bilities (^^>r=n~) delimiting the transition to the 
convective layer. For the specific heat capacity Cp, we as- 
sume that the molecular constituents of the atmosphere 
are ideal gases. In the convective regime, we adopt the 
adiabatic temperature profile. Our requirement to run a 
large number of models currently does not allow us to ex- 
plicitly account for scattering and re-radiation of a solid 
surface or clouds in the calculation of the temperature 
profile. 

2.2.3. Transmission Model 

The atmospheric transmission spectrum of an extra- 
solar planet can be observed when the planet passes in 
front of its host star. During this transit event, some 
of the star's light passes through the optically thin part 
of the atmosphere, leading to excess absorption at the 
wavelength at which molecular absorption or scattering 
is strong. We model the tran smission spect rum follow- 
ing the geometry described by iBrownl (J2001D . Given the 
planetary radius parameter, _Rp^io, and the surface pres- 
sure parameter (Section [^?T|) . we calculate the radius at 
the surface. Below this surface radius, we represent the 
planet as an opaque disk. Above the surface radius, we 



calculate the slant optical depth r(6) as a function of the 
impact parameter b by integrating the opacity through 
the planet's atmosphere along the observer's line of sight. 
We account for extinction due to molecular absorption 
and Rayleigh scattering. Light that is scattered out of 
the line of sight is assumed not to arrive at the observer. 
We then integrate over the entire annulus of the atmo- 
sphere to determine the total absorption of stellar flux as 
a function of wavelength. To assess the fit between the 
observations and the model spectrum for a given set of 
input retrieval parameters, we integrate the transmission 
spectrum over the response functions of the individual in- 
strument channels used in the observations. These sim- 
ulated instrument outputs serve in the MCMC method 
to evaluate the jump probability as described in the next 
section. 

2.3. Atmospheric Retrieval 
2.3.1. Bayesian Analysis 

We employ the Bayesian framework using the Markov 
Chain Monte Carlo (MCMC) technique to calculate the 
posterior probability density distribution, p{x\d), of the 
atmospheric parameters, x, given the measured transit 
depths in each of the instrumental channels, d. Accord- 
ing to Bayes' theorem, the posterior distribution is 
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where p {x) represents the prior knowledge or igno- 
rance of the atmospheric parameters. For extrasolar 
super-Earths, we currently have little or no prior knowl- 
edge of the atmosphere and therefore aim for an appro- 
priate non-informative prior distribution (Sections 12.3.41 
and 12.3.5]) . The term p{d\x) represents the probability 
of measuring the transit depths, d, given that the atmo- 
spheric parameters are x. It is modeled with the atmo- 
spheric "forward" model (Section l2.2|) and an estimate of 
the uncertainty in the observed transit depths. 

2.3.2. Markov Cham Monte Carlo (MCMC) 

The MCMC technique using the Metropolis-Hastings 
algorithm offers an efficient method of performing the 
integratio n necessary for the B ayesian analysis in Equa- 
tion ([3]) (jGelman et al.l l2003| ) . It has been applied to 
several oth er astronomical data sets and problems (e.g., 
iFordI 120051 and references therein). We use the MCMC 
technique to determine the best estimates and Bayesian 
credible regions for the atmospheric parameters by com- 
puting the joint posterior probability distribution of the 
atmospheric parameters. The uncertainty of individ- 
ual parameters introduced by complicated, non-Gaussian 
correlations with other parameters is accounted for in a 
straightforward way by marginalizing the joint posterior 
distribution over all remaining parameters. 

The goal of our MCMC simulation is to generate a 
chain of states, i.e., a chain of sets of atmospheric pa- 
rameters Xi , that are sampled from the desired posterior 
probability distribution p{x\d). Using the Metropolis- 
Hastings algorithm, such a chain can be computed by 
specifying an initial set of parameter values, ajg, and a 
proposal distribution, p{x'\xn). At each iteration, a new 
proposal state x' is generated and the fit between the 



transit observations and the model transmission spec- 
trum for the proposed set of atmospheric parameters 
is computed. The new proposal state x' is then ran- 
domly accepted or rejected with a probability that de- 
pends on (1) the difference between the x^-fita of the 
previous state and the proposal state and (2) the differ- 
ence in the prior probability between the previous state 
and the prior state. A proposal state that leads to an 
improvement in the y^-^t and a higher prior probability 
compared to the previous state is always accepted. A 
proposal state that leads to a worse fit or a lower prior 
probability is accepted according to the jump probability 

p{x^+, = x') = cxp |-1 [x^ {x') x^ (=r„)] I . ^, 

where we assumed Gaussian uncertainty in the obser- 
vations and 
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is the measure of fit between the observed transit 
depths, -Dfe,obs, and the model output, Ufc, model- The 
probabilities p{x,i)'AnA p{x')aic the prior probabilities of 
the previous and the proposal state. If the proposal state 
is rejected, the previous state will be repeated in the 
chain. 

2.3.3. Parallel Tempering 

A simple Metropolis-Hastings MCMC algorithm can 
fail to fully explore the target probability distribution, 
especially if the distribution is multi-modal with widely 
separated peaks. The algorithm can get trapped in a lo- 
cal mode and miss other regions of the parameter space 
that contain significant probability. The trapping prob- 
lem is expected for atmospheric retrieval of extrasolar 
planets for which only very sparse data are available. 
The challenge faced is similar to the one encountered in 
finding the global minimum of a nonlinear function. 

We address the challenge of a potentially multi-modal 
probability distribution by adopting a parallel temper- 
ing scheme (|Gregorvll2005() for our atmospheric retrieval 
method. In parallel tempering, multiple copies of the 
MCMC simulation are run in parallel, each using a dif- 
ferent "temperature" parameter, /3. The tempering dis- 
tributions are described by 
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(6) 



One of the simulated distributions, the one for which 
we choose /3 = 1, is the desired target distribution. The 
other simulations correspond to a ladder of distributions 
at higher temperature with j3 ranging between and 1. 
For /3 ^ 1 , the simulated distribution is much flatter and 
a wide range of the parameter space is explored. Random 
swaps of the parameter states between adjacent simula- 
tions in the ladder allow for an exchange of information 
across the different chains. In the higher temperature 
distributions (/3 <C 1), radically new configurations are 
explored, while lower temperature distributions (/3 ~ 1) 
allow for the detailed exploration of new configurations 
and local modes. 
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The final inference on atmospheric parameters is based 
on samples drawn from the target probability distribu- 
tion (p — 1) only. To probe the convergence, we perform 
multiple independent parallel-tempering simulations of 
the target probability distribution with starting points 
dispersed throughout the entire parameter space. 



2.3.4. 
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One challenge in atmospheric retrieval is that even 
for the most simple atmospheric parameterizations, some 
parameters describing the composition and state of the 
atmosphere might not be constrained well by the obser- 
vations. In this regime, it is important to choose an 
appropriate, non-informative prior probability distribu- 
tion on the parameters. One advantage of the Bayesian 
approach over traditional frequentist approaches is that 
we can explicitly state our choice of the prior probabil- 
ity distribution. Many approaches, e.g., constant- Ax^ 
boundaries, usually implicitly assume a uniform prior. 
While in many cases the uniform prior seems like a rea- 
sonable choice, it is worth noting that the uniform prior 
is variant under reparameterization. For example, a uni- 
form prior for log (a :) will not be a uniform prior for x 
(jGelman et al.|[2003f ). therefore the obtained results can 
depend on the choice of parameterization. 

In this work, we use a uniform prior on the radius 
ratio parameter, {Rp/R^)^^^, and the planetary albedo, 
A. The surface pressure, Psurf, is a "scale parameter" 
for which we do not know the order of magnitude a pri- 
ori. We therefore choose a Jeffrey prior for the surface 
pressure, i.e., a uniform prior for log (Psurf )• Since an in- 
finite surface pressure may agree with the observational 
data in the same way a sufficiently high finite value docs, 
the posterior distribution can remain unnormalizable un- 
less a normalizable prior distribution is chosen. To en- 
sure a normalizable posterior, we set an upper bound on 
the prior at p = 100 bar . Higher surface pressures are 
not considered because atmospheres of plausible compo- 
sitions will be optically thick to the grazing star light at 
higher pressure levels. 

The mixing ratios of the molecular gases are also scale 
parameters, suggesting that the usage of a Jeffrey prior 
for each of the mixing ratios would be appropriate. The 
constraint that the mixing ratios must add up to unity, 
however, prevents the assignment of a Jeffrey prior for 
the individual mixing ratio. We therefore introduce a 
reparameterization as discussed in the following section. 

2.3.5. Centered-Log-Ratio Transformation for Mixing Ratios 
of Atmospheric Constituents 

Since the mixing ratios of the molecular species in the 
atmosphere present parts from a whole, they must satisfy 
the constraints 

< X, < 1 and (7) 



(n— l)-dimensional space formally known as the simplex 
of n parts, S". The simplex includes only sets of mixing 
ratios for which the components sum up to 1. As a re- 
sult, the total number of free parameters describing the 
molecular composition is reduced by one. The mixing ra- 
tio of the nth species X„ can be calculated directly from 
the mixing ratios Xi...X„_i. 

In this subsection, we present a reparameterization for 
the mixing ratios that allows for efficient sampling of the 
full simplex with MCMC, while ensuring that all n molec- 
ular species may range across the complete detectable 
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range, e.g., 10 < X^ < 1, with a non-zero prior prob- 
ability, and that the results are permutation invariant, 
i.e., independent of which molecule was chosen to be the 
nth species. 

Previous work on atmospheric retrieval implicitly ac- 
counted for the constraints in Equations ([7]) and ^ by 
using free parameters only for mixing ratios of the mi- 
nor atmospheric gases and assuming that the remainder 
of the atmosphere is filled with the a-priori known main 
constituent. This approach is feasible for the inference of 
gas mixing ratios in hot Jupiters and Solar System plan- 
ets because the main constituents of the atmospheres, 
i.e., H2, are known a priori. In a Bayesian retrieval ap- 
proach, in which we do not know the main constituent 
of the atmosphere, however, parameterizing the abun- 
dances of minor species and assuming a main species is 
unfavorable. Assigning a Jeffrey prior, i.e. a uniform 
prior on the logarithmic scale, for n — 1 mixing ratios 
leads to a highly asymmetric prior that favors a high 
abundance of the nth species (Figure [1] (top)). In addi- 
tion, in cases with a low abundance of the nth species, 
we find that the asymmetric parameterization leads to 
serious convergence problems in the numerical posterior 
simulation with MCMC. 

To circumvent the drawbacks of highly asymmetric pri- 
ors in the interpretation of the results as well as the 
numerical convergence problems due a highly asymmet- 
ric parameterization, we use the centered-log-ratio trans- 
formation to reparameterize the compos ition (AitchisonI 
[TOSllPawlowskv-Glahn fc Egozcuel[2006h . The centered- 
log-ratio transformation is commonly used in geology and 
social scien ces for the statistical analysis of com positional 
data fe.g.. iPawlowskv-Glahn fc Egozcuell2006[ ). we find 
that it also enables the MCMC technique to efficiently ex- 
plore the posterior distribution of the atmospheric com- 
position across the complete simplicial sample space. 

For a mixture of n gases, the centered-log-ratio trans- 
formation of the i-th molecular species is defined as 



e. = ch(X,;) = ln 



X, 



(9) 



where 



E^^^i' 



(8) 



where n is the number of gases in the atmosphere. For 
the statistical analysis of the mixing ratios, it is impor- 
tant to recognize that the sample space of a composi- 
tion is not the full Euclidean space M", for which most 
statistical tools were developed, but only the restricted 
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(10) 



is the geometric mean of all mixing ratios Xi...X„. 

Each of the compositional parameters (,i may range 
between — cx) and -|-c», where the limit S.i -^ —00 indi- 
cates that ith species is of extremely low abundance with 
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Fig. 1. — Marginalized prior probability distribution for the mix- 
ing ratios in a mixture of three gases. The upper panel illustrates 
the prior probabilities for a parameterization in which the gases 
H2O and CO2 are described by free parameters and H2 is set to 
fill the remainder of the atmosphere. Assigning a Jeffrey prior, i.e., 
dP/dln (Xi) = const to all gases except one leads to a description 
that is permutation variant and strongly favors compositions with 
a high abundance of the remaining gas. Compositions with low 
amount of H2 are excluded by the prior because the prior proba- 
bility rapidly approaches zero for Xjj < 1% . The bottom panel 
shows the prior probabilities for the mixing ratios using the center- 
log-ratio transformation introduced in this work. The prior prob- 
ability for all gases in the mixture is identical, thereby providing 
permutation-invariant results. The prior probability distribution of 
all gases approaches the Jeffrey prior at mixing ratios below ~20%, 
and is, therefore, highly favorable for scale parameters. The diver- 
gence to infinity is only of theoretical nature. Once the signature 
of one gas is detected in the spectrum, the posterior probability of 
all other gases at Xi = 100% will go to zero. 



respect to the other molecular species, while ^i — > -f-oo 
indicates that the i-th species is abundant in the atmo- 
sphere. The composition ^ = [0,0, ... ,0] describes the 
center of the simplex at which all molecular species are 
equally abundant (Figure[2]). The only constraints on the 
transformed compositional parameters is X]j:=i ^i ~ ^■ 

A fully permutation-invariant description is obtained 
by using the centered-log-ratio transformed mixing ratios 
^1 . . . ^„_i as the free parameters and assigning a uniform 
prior for all vectors in the ^-space for which X^ > 10~^^ 
for all i = 1 . . .n. As the distances in the sample space 
spanned by ^1 . . . ^„_i scale with the differences in InXi 
for small mixing ratios, the MCMC can efficiently sample 
the complete space, even if the mixing ratios vary over 
several orders of magnitude. When transformed back into 
the Euclidean space of the mixing ratios, Xi, we obtain 
prior probabilities for each of the mixing ratios that have 
the properties of a Jeffrey prior below Xi < 20% (Figure 
[2]). The properties of a Jeffrey prior are highly favor- 
able for scale parameters such as the mixing ratios. The 
increase in the prior toward mixing ratios ^0.1 is a di- 
rect consequence of the fact that one or more molecular 
species in the atmosphere inevitably needs to have a sig- 
nificant abundance. The divergence toward infinity at 
\ogXi is of no practical relevance. If a single molecular 
species is detectable in the spectrum, then mixing ratios 
of 100% are excluded for all other species by the data . 




Fig. 2. — Simplicial sample space for a mixture of three gases il- 
lustrated in a ternary diagram. Using the centered-log-ratio trans- 
formation for the mixing ratios of the atmospheric gases, we ob- 
tain a symmetric parameterization of the composition in which all 
molecular species are treated equally, while simultaneously ensur- 
ing that the sum of the mixing ratios is unity. The zero point of 
the transformed mixing ratios, ^i, is at the center of the simplex. 
Toward the edges of the sample space, i.e., for low mixing ratios 
of one or more gases, the differences in the transformed mixing ra- 
tio, ^i, scale with ln{Xi). The scaling provides a region in which 
the prior probability dP/dln(Xi) remains constant (red) and al- 
lows the MCMC to efficiently sample down to exponentially small 
mixing ratios for all molecular species. 

2.4. Inputs 

The primary inputs to the retrieval method presented 
here are spectral and/or photometric observations of the 
wavelength dependent transit depths during the primary 
transit, (i?p/i?*) . Accurate estimates of the observa- 
tional error bars are of particular importance because 
they can significantly affect the constraints and conclu- 
sions made from the observations. Ideally, the spectral 
data would not be binned to reduce the apparent error 
bars. Binning of spectral data always leads to a loss of 
information and should only be done if is required to 
compensate for systematics. 

The spectra from the primary transit can be aug- 
mented by secondary eclipse measurements constraining 
the planetary albedo (or the atmospheric temperature) 
by including the information in the prior probability dis- 
tribution. An improved estimate of the temperature or 
planetary albedo can lead to improved constraints in 
all composition parameters. If no such observations are 
available, the retrieved uncertainties in the composition 
will fully account for the uncertainty in the planetary 
albedo. 

2.5. Outputs 

The output of the atmospheric retrieval is the posterior 
probability density distribution, p{x\d), of the retrieval 
parameters discussed in Section 12.11 This multidimen- 
sional distribution encodes our complete state of knowl- 
edge of the atmospheric parameters in the light of the 
available observations.. To illustrate our state of knowl- 
edge of a single parameter, we marginalize the poste- 
rior density distribution over all remaining parameters. 
For well-constrained parameters, one can summarize our 
knowledge of the parameter in just a few numbers, i.e., 
the most likely estimates and a set of error-bars and 
correlation coefficients. Depending on the nature of the 
observational data, however, we may obtain a posterior 



distribution that is not well-described by single best esti- 
mate plus the uncertainty around this estimate. For ex- 
ample, a multi-modal distribution would be indicative of 
multiple possible solutions. Highly asymmetric posterior 
distributions or only one-sided bounds will be obtained 
if the observations constrain the parameter only in one 
direction. 

We can also compute the constraints on atmospheric 
properties that do not serve as free parameters in our 
retrieval methods, such as the mean molecular mass, the 
total atmospheric mass above the surface, the mixing ra- 
tios by mass, or the elemental abundances. A set of the 
retrieval parameters introduced in Section 12.11 entirely 
describes the state of well-mixed, one-dimensional atmo- 
spheres. For each set of retrieval parameters in the chain 
obtained from the MCMC simulations, we can, therefore, 
compute the any other atmospheric property from the re- 
trieval parameters. In this way, we obtain a new chain 
for the desired atmospheric property that, interpreted as 
a sample from the marginalized distribution of the atmo- 
spheric property, can be used to infer constraints on the 
atmospheric properties by comparing the distribution to 
the equivalently obtained prior distribution. 

In this work, we present constraints on the mean 
molecular mass, total atmospheric mass above the cloud- 
deck/surface, and relative elemental abundances (Equa- 
tions 10-12), 
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where qj is the relative abundance of the elemental 
species j, Xi is the mixing ratio of molecule i, and n^.j is 
the number of atoms of elemental species j in molecule 
i. 

3. SYNTHETIC OBSERVATIONS OF SUPER-EARTH 
TRANSMISSION SPECTRA 

The goal of the quantitative analysis presented in 
this work is to explore which constraints on the at- 
mospheric properties of super-Earth exoplanets can be 
extracted from low-noise transmission spectra in the 
coming decades. In this section, we describe syn- 
thetic, low-noise observations of the transmission spec- 
tra of three different, hypothetical types of hot super- 
Earths transiting nearby M-stars as they may be ob- 
tained with the James Webb Space Telescope (JWST). 
To make the results most relevant in the context of cur- 
rent observational efforts, we adopt the stellar, orbital, 
and planetary parameters of the super-Earth GJ 1214b 
(|Charbonneau et al.|[2009l) . 

3.1. Atmospheric Scenarios 

"Hot Halley world. " — The f irst scenario w e consider is 
a vol atile-rich super-Earth (jKuchneii l2003t iLeger et al.l 
l2004r ). The motivation for this scenario is to investigate 
the retrieval results for an atmosphere that is predom- 
inately composed of absorbing gases. For our specific 



case, we consider a scenario in which the planet has ac- 
creted ices with the elemental abundances of the ices 
ident ical to those in the Halley comet in the Solar Sys- 
tem (|Jessberger fc K issel 1991). Some of the ices may 
have evaporated at the high equilibrium temperature and 
formed an atmosphere around the planet. We assume a 
well-mixed atmosphere around the planet whose chemi- 
cal composition is calculated from chemical equilibrium 
at the 1 bar level. The resulting atmosphere is com- 
posed of H2O (69.5%), CO2 (13.9%), H2 (11.8%), CH4 
(2.6%), and N2 (2.2%). All mixing ratios are givenas 
volume mixing ratios. The atmosphere is assumed to be 
clear and sufficiently thick such that no surface affects 
the transmission spectrum in this scenario. (Table [1]). 

"Hot nitrogen-rich world. " — For the second scenario we 
consider a nitrogen-dominated atmosphere representa- 
tive of a rocky planet with an outgassed atmosphere sim- 
ilar to the atmospheres of Earth and Titan. The moti- 
vation for this scenario is to investigate the retrieval re- 
sults for an atmosphere that is predominately composed 
of a spectrally-inactive gas that has no directly observ- 
able features in the spectrum. We chose an atmosphere 
dominated by N2 (95.4%) and rich in CH4 (3.5%), CO2 
(1%) and H2O (0.1% ) with a rocky surface at 1 bar. 

"Hot mini-neptune. " — The third scenario is a super- 
Earth with a thick hydrogen/helium envelope that has 
experienced a formation history similar to those of the 
giant planets. While we assume a primordial atmosphere, 
we deliberately chose an example that is away from solar 
abundance and chemical equilibrium. The motivation for 
this scenario is to demonstrate the retrieval for a scenario 
that does not correspond to our preconceived ideas. The 
atmosphere we consider is composed of 84.9% H2, 13.1% 
He, and 2% H2O, and has small mixing ratios of CO2 
(10~^) and CH4 (10~^). We consider the presence of an 
opaque cloud deck of unknown nature at the 100 mbar 
level. 

3.2. Observation Scenarios 

For each of the three atmospheric scenarios, we simu- 
late high-resolution transmission spectra {R > 10^) and 
model the output of the JWST NIRSpec instrument cov- 
ering the spectral range between 0.6 and 5/im. We as- 
sume that the transit depths in the individual channels 
of JWST NIRSpec can be determined to within 20% of 
the shot noise limit. To compute the photon flux for each 
spectral channel ind ividually, we scale t he spectrum of a 
typical M4.5V star (jSegura et al.l 120051 ) to the apparent 
brightness of GJ 1214. For JWST, we adopt an effective 
diameter of the primary mirror of 6.5 m and a through - 
put before the instrument of 0.88 (jDeming et al.|[2009f ). 
Wc consider the spectral resolution for observations us- 
ing the i?=100 CaF2 prism on NIRSpec {R = 30 . . . 280). 
Our noise model adopts a total optical transmission for 
the NIRSpec optics after the slit of 0.4 a nd a quantum 
efficie ncy for the HgCdTe detector of 0.8 (jDeming et al.l 
|2009( ) . We do not include any slit losses because the large 
aperture of JWST will encompass virtually all of the en- 
ergy in the point source function. We find that read 
noise (6e~per Fowler 8) and dark current (0.03 e~s~^) 
are insignificant compared to photon noise. We account 
for a ~ 20% loss of integration time due to the resetting 
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TABLE 1 
Mixing Ratios Of Molecular Constituents and Surface Pressure for the Three Super-Earth Scenarios Used to Generate 

Synthetic Transmission Spectra. 



and rcading-out of the detector, based on the expected 
saturation time of 0.43 sec for the brightest pixels on the 
NIRSpec detector for GJ 1214 (de Wit, personal commu- 
nication). For a first order estimate of the observational 
errors, we neglect the wavelength dependence of the grat- 
ing blaze function. 

Given the instrument properties, we calculate the 
expected variances of the in-transit and out-of-transit 
fluxes due to shot noise and calculate the expected error 
in the observed transit depth. We assume that the total 
observation time used to measure the baseline flux be- 
fore and after the transit equals the transit duration. We 
stack 10 synthetic transit observations for the high mean 
molecular mass atmospheres of the "hot Halley world" 
and "hot nitrogen-rich world" scenarios and use only a 
single transit observation for the more easily detectable 
hydrogen-dominated "hot mini-neptune" scenario. 

4. RESULTS 

Our most significant finding is that a unique constraint 
on the mixing ratios of the absorbing gases and up to 
two spectrally inactive gases is possible with moderate- 
resolution transmission spectra. Assuming a well-mixed 
atmosphere and that N2 and a primordial mix of H2 + He 
are the only significant spectrally inactive components, 
we can fully constrain the molecular composition of the 
atmosphere. We also find, however, that even a ro- 
bust detection of a molecular absorption feature (>10(t) 
can be insufficient to determine whether a particular ab- 
sorber is the main constituent of the atmosphere {Xi > 
50%) or just a minor species with a mixing ratio of only 
a less than 0.1%, if we do not observe the signature of 
gaseous Rayleigh scattering. 

In this section, we first conceptually identify the fea- 
tures in the spectrum that are required to uniquely con- 
strain the compositions of general exoplanet atmospheres 
(Section 14. ip . Based on the conceptual understanding, 
we then present numerical results from the MCMC re- 
trieval analysis for synthetic JWST NIRSpec observa- 
tions of three scenarios for the super-Earth GJ 1214b 
(Section [42]) . 

4.1. Uniquely Constraining Exoplanet Atmospheres 

Identifying absorbing molecules by their spectral fea- 
tures is conceptually straightforward, as molecules gen- 
erally absorb at distinct wavelengths. Constraining the 
mixing ratios of the atmospheric gases is more compli- 
cated because the observable transmission spectrum de- 
pends not only on the mixing ratios of the absorbers, but 
also on the exact planetary radius (as measured by the 
radius at the reference pressure level, i?p_io), the surface 
or cloud-top pressure, and the mean molecular mass of 
the background atmosphere. The absorber mixing ratios 
may therefore remain unconstrained over several orders 



of magnitude despite strong detections of a molecular ab- 
sorption features in the near-infrared wavelength range. 
The difficulty in constraining the mixing ratios of the 
atmospheric constituents was not discovered in previous 
work on atmospheric retrieval because hot Jupiters were 
assumed to be cloud-free and the mean molecular mass 
of their hy drogen-dominated atmosphere s was known a 
priori (e.g.. lMadhusudhan &: Seageill2009[ ). In the follow- 
ing, we explain which observables from different parts of 
the spectrum must be combined to successfully constrain 
the composition of a general exoplanet atmosphere. 

The transmission spectrum of an atmosphere with n 
relevant absorbers provides n + A independent observ- 
ables (Figure [3]). Combined, these n-|-4 observables can 
be used to constrain the n unknown mixing ratios of the 
absorbing gases, the mixing ratios of up to two spectrally 
inactive gases (e.g., N2 and primordial H2+He), the plan- 
etary radius at reference pressure level, and the pressure 
at the surface or upper cloud deck. The remaining infor- 
mation in the transmission spectrum is highly redundant 
with the n + 4 independent observables. 

The 71 + 4 independent observables are as follows. For 
each of the n absorbers, the broadband transit depths 
in the strongest features provide one independent ob- 
servable. By measuring and comparing the broadband 
transit depths in the absorption features of different 
molecules, we can directly determine the relative abun- 
dances of the absorbing gases in the atmosphere (Section 
I4.1.ip . For example, given the broadband transit depths 
in the 3.3 pm CH4 feature and in the 4.3 jam CO2 feature, 
we can determine that there must be "x" times more CO2 
than CH4 in the atmosphere. If the feature of one molec- 
ular absorber is not present, transit depth measurements 
at wavelengths for which the absorption cross sections 
of the molecular species are high can still provide an up- 
per limit on the absorber abundance relative to the other 
absorbers. 

Next, we have a total of one additional piece of infor- 
mation from either (1) the linear slope of the Rayleigh 
signature, (2) the shapes of individual features, or (3) the 
relative transit depths in features of the same molecule. 
The information from the three observables is highly re- 
dundant. From one of the three observables, we can di- 
rectly constrain the scale height, and, given an approxi- 
mate estimate of the atmospheric temperature, we can 
obtain an estimate of the mean molecular mass (Sec- 
tion [5T21)- Importantly, for general atmospheres that 
may contain clouds, it is the slope at which the tran- 
sit depth changes as a function of the extinction cross 
section thatenables us to measure the mean molecu- 
lar mass. The overall transit depth variation as cur- 
rently discusse d in many papers on the super-Earth 
GJ 12 14b (e.g..lMiller-Ricci fc Fortnevl lMotlBean et al.l 
I2OIII : iCroU et al.l I2OIII) measures the mean molecular 
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Fig. 3. — Unique constraints on the atmospheric properties based on observables in the transmission spectrum. The transmission spectrum 
of an atmosphere with n relevant absorbers contains n + 4 independent pieces of information that constrain the n mixing ratios of these 
absorbers, up to two mixing ratios of the two spectrally inactive components H2 + He and N2, the planetary radius at a reference pressure 
level, Rp^io, and the surface/cloud-top pressure. The left panel illustrates conceptually the individual observables in the transmission 
spectrum that carry the n+4 pieces of information for an example with n = 3 absorbers. For well-mixed atmospheres, the three observables 
"Slope of the Rayleigh signature,", "Shapes of individual features," and "Relative transit depths in features of same molecule" are redundant 
and provide only one independent piece of information. Note that to uniquely constrain any of the n + 4 atmospheric properties on the far 
right, all n + 4 pieces of information need to be available, unless additional assumptions are made. 



mass only for cfoud-frec atmospheres. 

Three additional independent eonstraints are provided 
by the transit depth offset of the Rayleigh slope, the fact 
that all mixing ratios must sum to 1, and the measure 
of the lowest transit depths in the speetrum. Compar- 
ing the transit depth offset of the Rayleigh slope and the 
transit depths at near-infrared wavelengths provides us 
with a measure of the amount of spectrally inactive gas 
in the atmosphere. Given all previously discussed ob- 
servables, the lowest transit depths in the spectrum al- 
low us to independently constrain the surface/cloud-top 
pressure (Section [4. 1.4p . If the surface/cloud-top is at a 
deep layer in the atmosphere and the molecular opacities 
across the observed wavelength range are high, a direct 
detection of a surface is not possible. In this case, the 
minimum transit depth will provide a lower limit on the 
surface/cloud-top pressure. 

Note that we need all n + A observables together in 
order to determine any of the atmospheric parameters 
uniquely. If a single piece of the puzzle is missing, e.g., 
the transit depths at short wavelengths are not observed, 
then the composition, including the volume mixing ratios 
of the absorbers, will stay weakly constrained, even if we 
have detected the feature with high significance. 

4.1.1. Relative Abundances of Absorbing Gases 

The infrared part of the transmission spectrum pro- 
vides a good tool to constrain the relative abundances of 
the molecular absorbers. Constraining the absolute value 
of the volume mixing ratios, however, might not be pos- 
sible to within orders of magnitude, even with low-noise 
observations capturing the shapes of the absorption fea- 
tures because the infrared part of the spectrum lacks an 
absolute reference for the transit depth. 

The measured transit depths in the absorption features 
are mainly related to the number density of the absorbing 
molecule, rii (r), as a function of the radius from center 



of the planet, r. The function n.i (r), however, provides 
little useful insight unless we are able to determine a 
surface radius or the number density of a second gas for 
comparison. In other words, if we do not detect a sur- 
face, then only the mixing ratios of the atmospheric gases 
have a meaningful interpretation, not the absolute num- 
ber densities, because we are missing an absolute pressure 
scale. Obtaining the mixing ratios of an absorbing gas 
directly by observing the absorption features of this gas 
is complicated, however, because different combinations 
of the absorber mixing ratio, Xi, and the planetary ra- 
dius, i?p, 10, can lead to the same number density, rii (r), 
and, therefore, to virtually the same absorption feature 
shape. To constrain the mixing ratio of a particular gas 
independently, a reference for the planetary radius needs 
to be obtained from a different part of the spectrum. 

For a quantitative example, we show the 4.3 /im ab- 
sorption feature of CO2 for two different atmospheric 
compositions in Figure [H The compositions are 90% N2 
and 10% CO2 for scenario 1 and 99.9% N2 and 0.1% CO2 
for scenario 2. If the planetary radius, Rp^io, for the two 
scenarios is the same, more starlight is blocked in sce- 
nario 1 due to the higher number density nco2 ('')• The 
transit depth inside the spectral features of CO2 is there- 
fore higher than for scenario 2. If the planetary radius 
Rp^io in scenario 2 is increased by only 70 km (ss 0.4% ), 
however, then the number density nco2 ('') in scenarios 
2 equals the one in scenario 1 , and the absorption feature 
of scenario 1 closely resembles the absorption feature of 
the new scenario 2. The remaining small difference in 
the transmission spectra is due to the effect of pressure 
and temperature on the absorption line broadening. The 
effect of changes in line broadening is of secondary order 
though, which makes the distinction between scenarios 
1 and 2 difficult, even with extremely low-noise observa- 
tions. Determining the mixing ratio of CO2 by observing 
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Fig. 4. — Degeneracy between the absorber mixing ratio, Xc02i ^^'^ ^^^ planetary radius at the reference pressure level, Rp,io. The left 
panel illustrates the modeled transmission spectra in the 4.3/im CO2 absorption feature for two different atmospheric compositions. The 
atmospheric composition of scenario 1 (red) is 10% CO2 and 90% N2. For the same planetary radius, the transit depth in the absorption 
feature of scenario 2 (blue; 0.1% CO2 and 99.9% N2) is lower by ~ 100 ppm across the entire feature. Increasing the planetary radius for 
scenario 2, however, leads to a transmission spectrum (green) that closely resembles scenario 1. As a result of this degeneracy between 
^C02 ^11^ ^P,iOi the mixing ratio of CO2 cannot be determined to within several orders of magnitude even for low-noise observations of 
the feature. The right panel shows the total pressures for planets with two different planetary radii (black) and the partial pressure of CO2 
as a function of the distance from the planetary center (colors match left panel). Two atmospheres with different absorber mixing ratios 
(red and green) can have the same partial pressure/number density as function of distance from the planetary center leading to similar 
absorption features. 



only CO2 features is, therefore, highly impracticaL 

A relative reference to break the degeneracy between 
the planetary radius and the mixing ratio is provided by 
the transit depths in absorption features of different ab- 
sorbers. Conceptually, this is possible because a change 
in the planetary radius affects the absorption features of 
both gases equally, while a change in the mixing ratio of 
one of the absorbers only affects the features of that ab- 
sorber. The transit depth difference between two features 
of different absorbers is independent of the planetary ra- 
dius and only dependent on the relative abundance ratios 
of the absorbers and their absorption cross sections in the 
features. As the absorption cross section are known from 
the molecular databases, comparing the transit depths in 
features of different absorbers allows one to constrain the 
relative abundance of these absorbers. For a numerical 
example, we return to our N2-CO2 atmosphere and re- 
place 1% N2 by CH4. In the spectral region around the 
3.5 //m CH4 feature (Figure[5|), the spectrum remains un- 
affected by the change in the CO2 mixing ratio and can 
serve as a reference to probe the relative abundances of 
CO2 and CH4. 

The infrared part of the transmission spectrum cov- 
ering multiple absorption features of different molecular 
species, therefore, provides good constraints on the rela- 
tive abundances of the molecular absorbers, but hardly 
contains any information on the volume mixing ratios of 
the absorbers. Even low-noise NIR observations captur- 
ing the shapes of the absorption features might not con- 
strain the absolute value of the mixing ratio to within or- 
ders of magnitude because the infrared part of the spec- 
trum provides little information on the abundances of 
spectrally inactive gases. 

In our example, the abundance ratio ^^^^is con- 
strained to within a factor of a few at 3a. while the 



volume mixing ratio of CH4 compatible with the simu- 
lated observation can vary over three orders of magnitude 
between 0.03% and 30%. Note that the reason for the 
correlation between Xco^ and XcHi is not the overlap 
of absorption features of the molecular species. Overlap- 
ping features would cause an anii-correlation between 
the abundances of the two absorbers. 



4.1.2. Mean Molecular Mass 

It has been shown that, for clear atmospheres, mea- 
suring the change in transit depth, AZ?, across spec- 
tral features gives an order of magnitude estimate of the 
scale height and, theref ore, the mean molecular mass 
(jMiller-Ricci et all |2009( ). For a general atmosphere, 
however, the depth of the absorption features cannot 
be used to constrain the mean molecular mass because 
clouds, hazes, and a potentially present surface also affect 
the depths of spectral features. Here we show for general 
atmospheres that the value of the mean molecular mass 
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with 



can be determined by measuring the slope, 

which the "observed" planet radius, Rp,x, changes as a 
function of the extinction cross section, a\, across differ- 
ent wavelengths. In practice, good observables to inde- 
pendently constrain the mean molecular mass are (1) the 
slope Rayleigh scattering signature at short wavelengths, 
(2) the relative sizes of strong and weak absorption fea- 
tures of the same molecule, and (3) the shape of the wings 
of a strong molecular absorption feature. 

For the optically thick part of the spectrum, the ob- 
served radius of the planet changes line a.rly with the 
logarithm of the extinction cross section (|Etangs et alj 

|2008[ ) and the slope j_nna^) is directly related to the at- 
mospheric scale height. 
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A measurement of the observed planet radius, Rp.\, 
at two or more wavelengths with different absorption or 
scattering cross sections, a\, therefore, permits the de- 
termination of the scale height. Given an estimate of the 
atmospheric temperature, e.g. T « T^q, we can obser- 
vationally determine an estimate of the mean molecular 
mass 



Fig. 5. — Constraining the relative abundances of absorbing gases from the near-infrared spectrum. The left panel shows the transmission 
spectrum for the atmospheric scenarios 1 (red, 10% CO2) and 2 (blue, 0.1% CO2) as described in Figure[3] but with 1% N2 replaced by the 
absorbing gas CH4. While the CO2 features have a vertical offset between scenario 1 and scenario 2, the CH4 features are unaffected by the 
CO2 mixing ratio. The transit depth in CH4 features can, therefore, serve as a relative reference for the CO2 mixing ratio. The right panel 
shows the two-dimensional marginal posterior probability distribution of the mixing ratios, Xqq^ and Xqh^ as retrieved from low-noise 
synthetic observations of the near-infrared spectrum (i?=100, <^ ( nr, 1 b.,)'^ ~ 20ppm, A = 2-5 /^m) of scenario 1. The solid lines indicate the 
la, 2(T, and 3cr credible regions. Measuring the transit depth in infrared features of CO2 and CH4 provides good constraints on the relative 
abundance ratios of the two gases. The volume mixing ratios of the gases, however, are strongly correlated. The individual mixing ratios 
remain unconstrained across three orders of magnitude despite robust detections of the infrared features and sufficient spectral resolution 
to observe the feature shapes. 

Measuring the transit depth D\ = ( r^ ) at two dif- 
ferent wavelengths Ai and A2 that arc dominated by 
Rayleigh scattering, therefore, provides the mean molec- 
ular mass. For a quantitative example, we show the 
transmission spectra of a C02-dominated atmosphere 
(95% CO2-I- 5% N2) and a N2-dominated atmosphere 
with small amount of CO2 as the only absorber (0.15% 
CO2, 99.85% N2) in Figure [6] Despite the difference 
in mean molecular mass, the feature depths are similar 
due to the different total amounts of the absorber CO2; 
thus, the feature depth cannot be used to determine the 
scale height. The Rayleigh slope at short wavelength 
(A < 0.8/xm), however, is only affected by the scale height 
and can serve as a good measure of the mean molecular 
mass. 

A second way of constraining the mean molecular mass 
is based on analyzing the detailed shape of the wing and 
core of spectral features. The absorption cross section 
varies strongly from the center to the outer wings. Mea- 
suring the detailed shape of a spectral feature at suffi- 
cient spectral resolution, therefore, probes a large range 
of cross sections and allows the constraint of the mean 
molecular mass. In our example, the detailed shape of 
the 4.3 /xm CO2 feature shows the difference between the 
scenarios (Figure [6]). For smaller mean molecular mass, 
the feature is higher at the center with narrow wings, 
while the large mean molecular mass leads to broader 
features. The measurement of this difference requires at 
least a moderate spectral resolution (i? ^ 50) and a high 
signal-to- noise ratio (S/N). 

A third way to probe the mean molecular mass is to 
quantitatively compare the broadband transit depths in 
different spectral features of the same absorber. Again, 
we probe the planetary radius at wavelengths for which 
the cross sections are different: strong absorption fea- 
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where the factor (l ± 
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accounts for the inherent 



uncertainty due to the uncertainty, ST, in modeling the 
atmospheric temperature, T, at the planetary radius 
r = Rp_\ (Appendix). Even if the uncertainty in the tem- 
perature estimate is several tens of percents of the face 
value, we will find useful constraints on the mean molec- 
ular mass because the mean molecular mass varies by a 
factor on the order of 8— 20 between hydrogen-dominated 
atmospheres and atmospheres mainly composed of H2O, 
N2, or CO2. 

The most straightforward way to determine the mean 
molecular mass is to measure the slope of the Rayleigh 
scattering signature at short wavelengths. The Rayleigh 
scattering coefficient varies strongly with wavelength as 
a (A) oc A~^ . From a (A) cc A~^, we obtain 
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Fig. 6. — Visible to near-infrared transmission spectra for two 
atmospheric scenarios with similar absorption feature sizes. The 
first scenario (blue) is a N2-rich atmosphere (0.15% CO2, 99.85% 
N2). The second scenario (red) is CO2 dominated (95% CO2 and 
5% N2). Despite the different mean molecular mass (28 vs. 44) the 
infrared absorption feature sizes are similar because the difference 
in the vertical extent of the atmosphere due to the different scale 
heights is compensated for by the difference in the total amount 
of the absorbing CO2 gas. A reliable way to determine the scale 

height is to measure either the Rayleigh scattering slope 
short wavelengths, the slope 



d{lnA) "" 

in strong absorption features, 



d{lntT x) 

or the relative feature depths between strong and weak features of 
the same molecule. The lower mean molecular mass atmosphere 
(blue) shows a steeper Rayleigh scattering slope, larger differences 
in the CO2 absorption features depth, and narrower features than 
the higher mean molecular mass atmosphere (red). 

tures have large absorption cross sections, while weaker 
features of the same absorber have smaller cross sections. 
A quantitative comparison of the depths of individual 
features therefore provides the gradient ,,, "^•^, and con- 
strains the scale height and mean molecular mass. For 
atmospheres with small mean molecular masses, the gra- 
dient ij,n^^^\ is large, resulting in greater differences in 
the transit depths between the strong and the weak fea- 
tures (Figure [6|). 

4.1.3. Volume Mixing Ratios of the Atmospheric 
Constituents 

The primary quantities affected by the abundance of 
spectrally inactive gases are the estimate of the mean 
molecular mass, /imix, discussed in Section [4. 1.21 and the 
transit depth offset of the molecular Rayleigh scattering 
slope, DRayi- Combining the information on /i,nix and 
^Rayi with the constraints on the relative abundances 
of the absorbers from the NIR spectrum (Section 14.1. ip 
provides unique constraints on the volume mixing ratios 
of both the spectrally inactive gases and molecular ab- 
sorbers. 

The atmospheres of Jupiter-sized planets present a 
simplified case for atmospheric retrieval. From their ra- 
dius and mass measurements, we can conclude that they 
have accreted a hydrogen-dominated atmosphere, thus 
we know the mean molecular mass a priori. Constrain- 
ing the volume mixing ratios of the molecular species in 
the atmosphere, nonetheless, requires the observation of 
the transit depth offset of the molecular Rayleigh scat- 
tering slope, DRayi at short wavelength. 



Neglecting for now the effect of refractive index vari- 
ations between different gas mixtures, the transit depth 
offset of the molecular Rayleigh scattering slope, i'Rayi, 
is only a function of planetary radius at the reference 
pressure, Rp.ia- The transit depth offset of the Rayleigh 
scattering signature can, therefore, serve as a reference 
transit depth to obtain an absolute scale for the atmo- 
spheric pressure and to determinethe volume mixing ra- 
tios of the absorbers from the absorption features in the 
NIR. In general, atmospheres rich in absorbing gases will 
show transmission spectra for which the transit depth in 
the Rayleigh scattering signature is small with respect to 
the transit depth in the NIR, while atmospheres domi- 
nated by spectrally inactive gases will show transmission 
spectra that have a strong Rayleigh scattering signatures 
and absorption features in the NIR at a lower transit 
depth levels. 

Obtaining the absolute abundances for all relevant ab- 
sorbing gases enables us to constrain the total mixing 
ratio of the spectrally inactive gases to be ^inactive = 
1 — '^i^iXi, where n is the number of absorbers in 
the atmosphere. Conceptually, the estimate of the mean 
molecular mass, ^mix, can then be used to determine the 
individual mixing ratios of the spectrally inactive com- 
ponents, N2 and H2 + He. We obtain the volume mixing 
ratios of N2 and primordial gas from 
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Individual constraints on H2 and He are not possi- 
ble because only two spectrally inactive gases can be fit. 
Three or more individual spectrally inactive components 
inherently lead to degeneracy because the same mean 
molecular mass of the spectrally inactive gases can be 
obtained by different combinations of the mixing ratios 
of the gases. 

In reality, the effective refractive index of the gas mix- 
ture varies depending on the composition and affects the 
transit depth offset of the Rayleigh scattering signature 
(Section l2.2.1|l . When simultaneously retrieving the mix- 
ing ratios of all gases, however, we also determine the re- 
fractive index in the process because the refractive index 
is a direct function of only the mixing ratios and not an 
additional unknown. 

4.1.4. Surface Pressure 

We can discriminate between a thick, cloud-free atmo- 
sphere and an atmosphere with a surface, where the sur- 
face is either the ground or an opaque cloud deck. For 
atmospheres with an upper surface at pressures lower 
than Psurf ^ 100 mbar ... 5 bar (depending on composi- 
tion), we can quantitatively constrain the pressure at this 
surface. For a thick atmosphere, we can identify a lower 
limit on the surface pressure. 

A surface strongly affects the part of the spectrum 
without absorption features while having only a weak 
or negligible effect on the part of the transmission spec- 
trum with strong molecular absorption or scattering. In 
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the spectral regions with weak absorption, a thin atmo- 
sphere has a relatively constant continuum because the 
surface cuts off the grazing light beams at a radius that is 
independent of the wa velength (iDcs Marais et alJ 120021 : 
lEhrenreich et al.ll2006f ). A thick atmosphere without a 
surface lacks a flat continuum. 

Conceptually, the optically-thick regions of the spec- 
trum, those for which the transit depth is independent 
of the surface pressure, constrain the mixing ratios of the 
molecular species in the atmosphere as described in Sec- 
tion 14.11 The surface pressure can then be determined 
from the transit depths in the parts of the spectrum in 
which absorption and scattering are weak (Figure[3]). For 
a noise- free spectrum, the strongest constraint on the sur- 
face pressure is provided by the minimum transit depth, 
^min, measured across the spectrum. The minimum 
transit depth determines the deepest pressure level for 
which light is transmitted through the atmosphere and, 
therefore, provides a lower limit on the surface pressure. 
In practice, the retrieval of the mixing ratios, surface 
pressure, and other parameters is performed simultane- 
ously based on the information in the entire spectrum. 

Taking the example of a N2-CO2 atmosphere (Figure 
[7]) ; the shape of the spectral features in the 2-6 /im range 
is mostly unaffected by changes in surface pressure, as 
long as the surface pressure is higher than 100 mbar. For 
exquisite data, the composition of the atmosphere can, 
therefore, be retrieved from the 2 to 6 /im range indepen- 
dently of the surface pressure. Conversely, the spectral 
region between 0.5 and 2 fxrn is strongly affected by sur- 
face pressure, but the effects of surface pressure and mix- 
ing ratios are usually degenerate. Taking the retrieved 
mixing ratios from the part of the spectrum unaffected 
by surface pressure, allows a unique determination of the 
surface pressure. 

4.2. Numerical Results 
4.2.1. Constraints on Composition 

In this section, we present numerical results for syn- 
thetic JWST NIRSpec observations of the transmission 
spectrum of the super-Earth GJ 1214b. In all three at- 
mospheric scenarios studied, we find that the analysis 
of moderate spectral resolution {R w 100) transmission 
spectra covering the spectral range between 0.6 and 5 /xm 
can provide narrow probability posterior distributions for 
all absorbing gases with mixing ratios of several ppm or 
higher. (Figures [8l fT0|) . The well-constrained probability 
distributions allow a direct inference of the most likely es- 
timate and credible regions (Baycsian equivalent to con- 
fidence intervals) for the mixing ratios of the individual 
molecular species. Spectrally inactive gases can also be 
constrained if their abundances are sufficient to affect the 
mean molecular mass and the Rayleigh scattering signa- 
ture at short wavelengths. 

For a given transmission spectrum, the relative uncer- 
tainties in the mixing ratios -^^ of absorbing gases (e.g., 
H2O, CO2, CII4) are only weakly dependent on the ab- 
solute values of the mixing ratios. In other words, mi- 
nor gases with mixing ratios as low as tens of ppm can 
be constrained as well as the major atmospheric con- 
stituents (e.g.. Figure inija)). The reason for this is that 
the long geometric path length of the grazing stellar light 
through the atmosphere of the extrasolar planet leads to 
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Fig. 7. — Effect of the surface pressure on the transmission spec- 
tra of exoplanets. The transmission spectra of model atmospheres 
with 99% N2 and 1%C02 arc depicted for four different surface 
pressures. The radius, Rp 10, at which the atmospheric pressure is 
10 mbar is set to the same value for all four. With Rp^io set to the 
same value, the transit depths in the strong CO2 absorption bands 
(e.g., around 2.7, 3.3, and 4.3/xm) are independent of the surface 
pressure since the grazing star light at these wavelengths does not 
penetrate to lower layers of the atmosphere. Conversely, the parts 
of the spectrum (< 1.6 fj,m) with little molecular absorption and 
scattering show a strong dependence on the surface pressure. Com- 
bining information from parts of the spectrum that are sensitive 
to surface pressure and parts of the spectrum that are insensitive 
to surface pressure enables one to find independent constraints on 
atmospheric composition and surface pressure. At sufficiently high 
surface pressures, even the spectral regions with low absorption 
cross sections become optically thick for a grazing light beam and 
the complete spectrum becomes insensitive to further increases in 
the surface pressure. For these thick atmospheres, only a lower 
limit on the surface pressure can be found. 

significant spectral features in t he transmissio n spectrum 
even for low-abundance gases (jBrownl [200 ID . Increas- 
ing the mixing ratio increases the transit depth across 
the feature, but the uncertainty in the observed transit 
depth and, therefore, the uncertainty on the logarithm 
of the mixing ratio remains mostly unchanged. A detec- 
tion limit does exist at low abundances, however because 
overlapping features of other absorbers may mask the fea- 
tures of extremely low-abundance gases. If all spectral 
regions in which the absorber is active are occupied by 
stronger features of other absorbers, then only an upper 
limit on the mixing ratio of the gas can be found (Figure 

In constrast to the absorbing gases, the uncertainties 
in the mixing ratios of spectrally inactive gases (e.g., 
N2, II2) are strongly dependent on their mixing ratios. 
Spectrally inactive gases affect the transmission spec- 
trum only through changing the mean molecular mass 
and changing the transit depth difference between the 
Rayleigh scattering signature and the NIR spectrum 
(Section I4.1.3p . If the mixing ratio of a spectrally in- 
active gas is a few tens of percent or more, the effect 
of the spectrally inactive gas on the atmospheric mean 
molecular mass is strong and, therefore, it is relatively 
easy to identify the spectrally inactive gas and constrain 
its mixing ratio (Figures [51 and [T(7)) . For lower mixing 
ratios, however, only weak constraints or an upper limit 
can be placed on the mixing ratios of spectrally inac- 
tive gases because their effect on the spectrum becomes 
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negligible (e.g., Figure [5]). This is particularly true for 
N2 whose molecular mass (28 u) differs only by a factor 
of ~ 1.6 or less from the molecular masses of the most 
common spectrally active gases, e.g., H2O (18 u), CH4 
(18 u), and CO2 (44 u). Constraining the mixing ratio 
of H2 is achieved down to lower mixing ratios because 
its molecular mass is lower than that of most absorbing 
gases by a factor of six or more. 

4.2.2. Constraints on Surface Pressure 

In the retrieval output, a thin atmosphere with a sur- 
face and a thick, cloud-free atmosphere show distinct pos- 
terior probability distributions for the surface pressure 
parameters. For atmospheres that are thin or have an 
upper cloud deck at low pressure levels, the probability 
distribution resembles a well-constrained, single-modal 
distribution (Figure fTOT d)). For thick atmospheres that 
lack an observable surface, only an upper limit to the 
surface pressure can be retrieved (Figure [8lJd))The pos- 
terior probability of the surface pressure plateaus toward 
high pressures, indicating that further increases in sur- 
face pressure lead to equally likely scenarios. We em- 
phasize that, for a terrestrial planet, the two scenarios of 
a thin atmosphere with a solid surface or a thick atmo- 
sphere with an opaque cloud deck are not distinguishable 
from the transmission spectrum. 

4.2.3. Effect of Unobserved Temperature 

An inherent correlation arises between the planetary 
albedo and the mean molecular mass (Figure lll|l if no 
direct measurements of the planetary temperature or the 
planetary albedo are available. While the correlation 
does not lead to uncertainties of individual parameters 
that range over orders of magnitude, it can be the dom- 
inant source of uncertainty on the composition if small 
error bars are achieved for the observations of the pri- 
mary transit, but no direct measurements of the bright- 
ness temperature or the planetary albedo are available 
from secondary eclipse observations . 

The reason for the correlations between mean molecu- 
lar mass and albedo is that the primary observables for 
the mean molecular mass (see Section I4.1.2P constrain 
the scale height rather than the mean molecular mass di- 
rectly. Given the observational constraints for the scale 
height, different combinations of the atmospheric tem- 
perature and mean molecule mass may agree equally well 
with the scale height constraints imposed by the spec- 
trum. The atmospheric temperature, in turn, is primar- 
ily determined by the planetary albedo, giving rise to the 
correlation between planetary albedo and mean molecu- 
lar mass. 

A higher mixing ratio of H2 lowers the mean molecular 
mass without creating new absorption features. An at- 
mosphere with more H2 and less of the main constituent 
(here: H2O) in conjunction with an increased planetary 
albedo shows virtually the same transmission spectrum 
as the one shown in Figure [H As a result, the posterior 
distribution shows a significant correlation between A"h2 
and the Bond albedo as well as between Xjja and ^^1120 
(Figure [ni). 

4.3. Elemental Abundances 

The ability to constrain the mixing ratios of both the 
absorbing and the spectrally inactive gases in the atmo- 



sphere provides us with the opportunity to probe the 
relative abundances of the volatile elements H, C, O, 
and N of the atmospheres of cxoplanets. Conceptually, 
the retrieval of the elemental abundances in the atmo- 
sphere is directly linked to the retrieval of the molecular 
mixing ratios, since the constraints on elemental abun- 
dances are derived from the probability density distribu- 
tion of the molecular mixing ratios (Section 12. 5[) . Fol- 
lowing the result in the previous subsections, low-noise 
observations of moderate to high spectral resolution lead 
to well-constrained molecular mixing ratios and there- 
fore also allow determination of well-constrained elemen- 
tal abundances. 

For quantitative constraints, we return to our three 
scenarios for hot super-Earth atmospheres. The trans- 
mission spectra can clearly discriminate the different rel- 
ative abundances of the volatile elements in the three 
scenarios (Figure I12p and may be used to probe their 
formation history and evolution. The hot mini-Neptune 
scenario can be identified to have accreted and retained 
a primordial atmosphere dominated by hydrogen, similar 
to gas and ice giants in our solar system. At the other 
end of the parameter space, the elemental composition 
of the second scenario indicates an atmospheric compo- 
sition dominated by nitrogen. The third scenario shows 
an atmosphere that has retained some hydrogen in heav- 
ier molecular species. 

4.4. Total Atmospheric Mass 

We find that transmission spectra present an oppor- 
tunity to determine a lower limit for the total mass of 
the atmosphere of cxtrasolar planets based purely on ob- 
servations. Conceptually, the constraints on the total 
atmospheric mass are derived from the constraints on 
the composition and the surface pressure of the atmo- 
sphere. The ability to constrain mixing ratios of both 
the absorbing and the spectrally inactive gases in the 
atmosphere enables us to constrain the mean molecular 
mass and therefore the mass density in the atmosphere as 
a function of pressure. Combined with the independent 
constraints on the surface pressure, we can integrate the 
mass density to estimate the total column density of the 
atmosphere. Under the assumption of an approximately 
uniform bulk composition and surface pressure around 
the spherical planet, we therefore obtain a constraint on 
the total mass of the atmosphere. 

Two fundamental limits prevent one from accurately 
constraining the total atmospheric mass. First, the mass 
determined from transmission spectra corresponds to the 
total atmospheric mass above the uppermost surface (see 
Figure fTOTf) for a quantitative example). Second, follow- 
ing the arguments on the retrieval of the surface pres- 
sure (Section I4.1.4p . we will not be able to detect the 
uppermost surface explicitly if the cloud-free part of the 
atmosphere is sufficiently thick (Figures [8] and [9]). In 
conclusion, we can always determine a lower limit on the 
atmospheric mass once we have detected spectral fea- 
tures, but determining an upper limit is only possible if 
the atmosphere is sufficiently thin for the surface to be 
detected and an opaque cloud deck can be excluded from 
theoretical principles. 

5. DISCUSSION 
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Fig. 8. — Synthetic transit observations and atmospheric retrieval results for the "Hot Hallcy world" scenario for the super-Earth GJ 1214b. 
The synthetic observations shown in panel (a) were simulated considering 10 transit o bserv ations with JWST NIRSpec and assuming that 
observational uncertainties within 20% of the shot noise limit are achieved (see Section I3.2| l . The dashed line shows the analytical Rayleigh 
scattering slope for comparison. Panels (b)-{g) illustrate the marginalized posterior probability distribution for the atmospheric parameters 
retrieved from the synthetic observations. For illustrative purposes, the distributions are normalized to a maximum value of 1. The asterisks 
indicate the values of the atmospheric parameters used to simulate the input spectrum. The narrow, single-peaked, posterior probabilities 
for the mixing ratios of H2O, CO2, CH4, and H2 in panel (b) indicate that unique constraints on the abundance of these gases can be 
retrieved in agreement with the atmospheric parameters used to simulate the input spectrum. H2O can be identified as the main constituent. 
Only an upper limit on the mixing ratio of N2 can be found because small amounts of the spectrally inactive N2 have a negligible effect on 
the observed transmission spectrum. Constraints are also obtained for the surface/cloud-top pressure and total atmospheric mass above the 
surface/cloud- top (Panels (f) and (g)). In this scenario, the atmosphere is cloud- free down to high pressure levels, thus only a lower bound 
on the surface pressure can be found. No upper bound can be inferred as indicated by the posterior probability distributions approaching 
the flat prior distribution at high surface pressures. 



5.1. Obtaining Observational Constraints on 
Atmospheric Composition 

The objective in the development of the new retrieval 
methodology was to remain independent of model as- 
sumptions as much as possible and let the observational 
data speak for themselves. By not employing any as- 
sumptions on the elemental composition, chemical equi- 
librium, or formation and evolution arguments in the re- 
trieval process, our results remain independent of precon- 
ceived ideas for the planet under investigation. The at- 
mospheric composition is, instead, completely described 
by free parameters and no hidden biases or asymmetries 
favoring a particular molecular species in the Bayesian 
prior arc introduced. 

The main assumptions in our approach are limited to 
the principles of radiative transfer in local thermody- 
namic equilibrium, hydrostatic equilibrium, and the cor- 
rectness of the molecular line lists. For cases in which no 
secondary eclipse measurements are available, we added 
radiative-convective equilibrium to determine a reason- 
able temperature structure. However, since the exact 
temperature profile has a secondary effect on the trans- 
mission spectrum, we find that this temperature mod- 
cling has little effect on the retrieval results we obtain. 
In order to reasonably constrain the atmosphere given 
the limited data available in the near future, another 
guideline in the development was to keep the number of 
parameters to a minimum, while still ensuring that the 
parameters uniquely define the state of the model atmo- 
sphere. In this study, we assigned a single free parameter 
for the effective mixing ratio of each molecular species in 
the atmosphere, effectively comparing well-mixed atmo- 
spheres to the observation (see Section [575]) . 

The main advantage of our retrieval approach for 



super-Earths over detailed modeling of atmospheric 
chemistry and dynamic models is that it provides an op- 
portunity to discover unexpected types of planets and 
atmospheres that do not agree with our current un- 
derstanding of formation, evolution and atmospheric 
processes. For example, no self-consistent atmospheric 
chemistry model would predict that the atmosphere of 
a terrestrial planet like Earth has an O2 mixing ratio as 
high as 21%. Only the direct interpretation of obser- 
vations can tell us about the existence of such unusual 
atmospheric compositions. The identification of absorp- 
tion lines of the O2 absorption without constraining the 
high mixing ratio would not be a biosignature because 
low abundances of O2 can be a result of photochemical 
composition. 

In this work, we have shown that we can quantitatively 
constrain the atmospheric composition based on observa- 
tions of the transmission spectrum, even for super-Earth 
planets for which the composition is completely unknown 
a priori. Transmission spectroscopy is a good tool for re- 
trieval of the composition because the absorber amount 
and mean molecular mass arc the main drivers deter- 
mining the features of the transmission spectrum, while 
the influence of the unknown temperature profile is sec- 
ondary. We also find, however, that the characteriza- 
tion of super-Earth planets requires considerably more 
spectral coverage and precision than the characterization 
of the hydrogen-dominated atmospheres of hot Jupiters. 
This is not only because of the smaller signal, but also 
because of a more complex parameter space that can re- 
sult in degeneracies. 



5.2. Non-Unique Constraints for Hazy Atmospheres 



17 



Rayleigh 






10^ 10"^ 10"^ 10"" 10^ 1 0"^ 1 0"' 1 0° Mean Molecular Mass [u] 
Volume Mixing Ratio 



0.5 0.6 0.7 0.8 0.9 1 



Wavelength [^itn] 



3 4 5 






Fig. 9. — Synthetic transit observations and atmospheric retrieval results for the "Hot nitrogen-rich world" scenario of a super-Earth 
with the physical properties of GJ 1214b. The panels identities are identical to Figure \S\ Observational errors were modeled for 10 
transit observations with JWST. The narrow posterior distributions for the mixing ratios of H2O, CO2, CH4, and N2 indicate that unique 
constraints on the abundance of these gases can be retrieved in agreement with the atmospheric parameters used to simulate the input 
spectrum. N2 can be identified to be the main constituent of the atmosphere due to its eff"ect on the mean molecular mass and the Rayleigh 
signature. While atmospheric models with X-^ ^ 0.1 . . . 1% are favored by the synthetic observations, atmospheric models with X^ —> 
retain a significant probability and no lower bound on X^ can be found. The most likely value for the surface pressure is in agreement with 
the surface pressure parameter used to simulate the input spectrum, suggesting that the atmosphere is optically thin at some wavelengths. 
The synthetic observations are not sufficient, however, to find a statistically significant upper limit on the surface pressure and fully exclude 
a thick envelope. 
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Fig. 10. — Synthetic transit observations and atmospheric retrieval results for the "Hot mini-neptune" scenario of a super-Earth with the 
physical properties of GJ 1214b. The panels identities are identical to Figure [8] Observational errors were modeled for a single transit 
observation with JWST NIRSpec. Note the difference in the scale of the transit depth axis compared to Figures [8] and [9] The narrow 
posterior distributions for the mixing ratios of H2O, CO2, and H2 indicate that unique constraints on the abundance of these gases can be 
retrieved in agreement with the atmospheric parameters used to simulate the input spectrum. Based on the low mean molecular mass, H2 
can clearly be identified as the main constituent of the atmosphere. N2 mixing ratios larger than a few percent can be excluded. An upper 
limits at the ppm level can be found for the mixing ratio of CH4. A surface (here: due to the opaque cloud deck) can be identified at a 
pressure level between 65 and 150 mbar with 3(t confidence. 



Photochemically-produced hazes may have a signifi- 
cant opacity at short wavelengths and may mask the 
signature of molecular Rayleigh scattering if they are 
present in the upper atmosphere, . While we may still 
be able to probe the near-infrared spectrum and identify 
molecular absorbers, we will not be able to probe the 
transit depth offset of Rayleigh scattering due to molecu- 
lar scattering. Without making further assumptions, wc 
will, therefore, lose the ability to constrain the mixing 
ratio of the molecular species over orders of magnitude. 



even for the major constituents of the atmosphere (see 
Section [4T|) . 

By measuring either the slope of the Rayleigh scat- 
tering signature or the shapes of molecular absorption 
features, we will still obtain information on the scale 
height of the atmosphere and, therefore, obtain an es- 
timate on the mean molecular mass fSection l4.1.2p . We 
will not, however, be able to constrain the total amount 
of the spectrally inactive gases. Since the near-infrared 
spectrum only constrains the relative abundances of the 
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Fig. 11. — Two-dimensional marginalized probabilities for pairs of atmospheric properties for simulated JWST NIRSpec observations of 
the "Hot Hallcy world" scenario for GJ 1214b. The synthetic o bserv ation used for the atmospheric retrieval is illustrated in FigurejS] For 
observations that cover all ?i + 4 observables discussed in Section [4.1l the posterior distribution of the atmospheric parameter s re trieval lacks 
degeneracies or strong correlations that would keep individual parameters unconstrained over orders of magnitude (Figure 1 1111 . Planetary 
albedo and the mean molecular mass show a correlation because different combinations of atmospheric temperature and mean molecular 
mass may lead to the same scale height and, therefore, to similar spectral feature shapes. 

fcrcnt atmospheric mixtures of H2, N2, and absorbing 
gases in the correct ratiosthat produce nearly identical 
transmission spectra. As a result, we obtain a degen- 
eracy that prevents us from constraining the molecular 
abundances uniquely. 

One assumption that could be made to compensate 
for the lack of information is to not consider the simul- 
taneous presence of nitrogen gas, N2, and hydrogen gas, 
H2. In general, however, the simultaneous presence of 
N2 and H2 cannot be excluded, even though the pre- 
ferred chemical form of the two elements N and H in 
chemical equilibrium is ammonia NH3 at a wide range of 
temperatures and pressures because the energy barrier 
for the reaction is too large due to strong triple bonds in 
nitrogen molecules. 




Fig. 12. — Quaternary diagram illustrating the posterior prob- 
ability distributions for the relative abundances of the elements 
H, C, O, and N. The colored volumes represent the 2u Bayesian 
credible regions of the elemental composition for the "hot Halley 
world" (blue), the "hot nitrogen-rich world" (red) and the "hot mini- 
Neptune" (green). The symbols E, M, V, and S indicate the ele- 
mental abundances in the atmospheres of the Solar System planets 
Earth, Mars, Venus and Saturn, respectively, for comparison. The 
four vertices of the diagram represent an atmosphere that is fully 
composed of H, C, O, or N. The opposing faces are surfaces on 
which the fraction of H, C, O, or N is zero. At each point inside 
the tetrahedron, the elemental fraction is given by the distances 
perpendicular to the faces. 

absorbing gases (Section 14. l.ip . we can hypothesize dif- 



5.3. Stratified Atmospheres 

Our parameterization of the atmosphere in the re- 
trieval process assumes a well-mixed atmosphere. Given 
the limited amount of data available in the near-future, 
the motivation for the assumption of well-mixed atmo- 
spheres is to keep the number of free parameters, which 
make use of similar information in the spectrum, small. 
Observations of the Solar System planets justify the ap- 
proach because ^95% of the gas in each of the Solar 
System atmospheres is composed of long-lived, chemi- 
cally stable species that were mixed by t urbulence and 
diffusion for a sufhcicntly long time (see iLodders fc Jrl 
[1998; iPater fc Lissauer., 201(1 and reference therein). If 
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exquisite observations become available in the future, 
however, it may be useful to extend the parameterization 
to retrieve compositional gradients. For some molecular 
species, such gradients may be identified as biomarkcrs 
caused by sources at or in the planetary surface. 

Physical effects that lead to compositional stratifica- 
tions of the gaseous species in the Solar System atmo- 
spheres are (1) condensation of gases that condense at 
pressure and temperature levels encountered in the at- 
mosphere, or (2) production or destruction of gas by pho- 
tochemistry or geology, or (3) variation of chemical equi- 
librium with altitude due to the altitude dependencies 
of pressure and temperature. Changes of gas concen- 
tration with altitude that are caused by condensation, 
however, arc usually not relevant for our retrieval be- 
cause transmission spectroscopy only probes layers above 
the condensation clouds. Similarly, strong changes in the 
chemical equilibrium usually occur at deep levels in thick 
envelopes that are unlikely to be probed in transmission. 
In addition, the mixing ratios of gases that do vary with 
altitude often only vary over less than one order of mag- 
nitude, e.g., C O, H2O, SO2 in the atmosphere of Venus 
(|Huntenlll983l ). Observational data that are less noisy 
than the synthetic JWST observations considered in Sec- 
tion 14.21 arc necessary to robustly detect such gradients 
because the retrieved mixing ratios for minor species in 
the synthetic JWST observations are uncertain to within 
one order of magnitude, even for well-mixed atmospheres 
(Section 14. 2p . Photochemistry or surface sources, how- 
ever, may lead to concentration gradients that are sub- 
stantial at pressure levels probed by transmission spec- 
troscopy (e.g., ozone in Earth's atmosphere) and may 
justify extensions to our parameterization in the future. 

For atmospheres with a stratified composition, our re- 
trieval method determines an altitude-averaged mixing 
ratio that best matches the observed transmission spec- 
trum. In test cases, we verified that the atmospheric 
retrieval method remains robust in providing a reason- 
able estimate for the mixing ratios for stratified atmo- 
spheres. We simulated transmission spectra for strati- 
fied atmospheres and performed the retrieval assuming 
a well- mixed atmosphere (Figure [T3|) . We found that, 
the method remains robust and the retrieved mixing ra- 
tios for stratified gases correspond to the mixing ratios 
at the pressure levels at which the functional derivatives 
with respect to the mixing ratio are the highest. Using 
the functional derivatives, we can, therefore, estimate a 
posterior at which pressure level we have probed the at- 
mospheric mixing ratio of the gas. 

5.4. A Predictive Tool for Planning Observational 
Programs and Designing Future Telescopes 

Additional applications of the retrieval method pre- 
sented here are (1) to evaluate and optimize observa- 
tional strategies in the planning and proposal process of 
exoplanet observations and (2) to guide the design of 
future telescopes and instrumentation for the character- 
ization of exoplanets. Numerical studies using the re- 
trieval method can provide concrete guidelines on how 
many transits must be observed and what spectral range 
and spectral resolution is ideal for a specific atmospheric 
characterization. The motivation behind the approach 
is that observational characterizations of super-Earth at- 
mospheres are extremely challenging and the observation 



of many transits with highly capable observatories will be 
required. 

While the retrieval method is not essential to recognize 
the need for higher S/N data than currently available, the 
retrieval method is critical to determining exactly what 
magnitude of data is required for a useful atmospheric 
characterization. GJ 1214b is a good example: In the 
past, 1 or 2 transits were observed at different wave- 
lengths by various o bservers with the goal to character- 
izc the atmosphere ("Be an et al.ll20ia I201ll ICroll et al.l 
[201 1: Berta et al. 2012). Even though some of these ob- 
servations approached the theoretical photon limit, few 
constraints on the atmosphere could be found. Retrieval 
analysis on simulated data shows that ten or more tran- 
sits are required with currently available observatories 
in order to separate out the two currently most plausi- 
ble scenarios of a water world and a hydrogen-dominated 
atmosphere with high-altitude clouds (B. Benneke et al., 
in preparation). We propose a new paradigm in planning 
observations in which retrieval analysis of synthetic ob- 
servations can quantitatively justify the necessity of large 
campaigns with ground-based or space-based observato- 
ries for atmospheric characterization. 

A conceptually understanding of which details in the 
spectrum arc required to constrain the atmospheric com- 
position will enable observers to rationally select the 
wavelength ranges and spectral resolutions of transit ob- 
servations for atmospheric characterization. For exam- 
ple, constraining the volume mixing ratios of molecular 
absorbers in super-Earth atmospheres will require mea- 
suring the Rayleigh scattering signature in addition to 
the molecule's absorption signatures in the infrared. If 
the Rayleigh scattering signature is not observed, even 
low-noise observations of the spectral features in the 
near-infrared with JWST will not provide the informa- 
tion required to determine the volume mixing ratios. 

In addition to the general results, we used simulated 
JWST NIRSpec observations covering the full range 
from 0.6 to 5 fim to show which quantitative constraints 
on the composition could be obtained with JWST NIR- 
Spec. An assessment of how well the different atmo- 
spheric properties can be constrained, how many transits 
are needed, or what observational parameters are opti- 
mal needs to be done with a specific scientific objective 
and the available instruments in mind. Therefore, we en- 
vision to use the methodology in the future in collabora- 
tion with observers to evaluate near-future observational 
opportunities with currently available instruments. 

5.5. Compositional Retrieval versus Detailed 
Atmospheric Modeling 

The atmospheric retrieval method and detailed, 
self-consistent mode l ing of a planeta r y atmosphere 
(IBurrowset al.l 119971: ISeager et al] 120051: JBurrows et"all 
|2008[ ) present two completely complementary approaches 
to the study of planetary atmospheres. For studies of 
solar system planets, it is common practice to use ob- 
servational constraints from remote sensing to motivate 
or validate detailed modeling of chemistry or dynamics. 
For a classic example, the retrieved temperature-pressure 
profile from radio occultation measurements on Titan 
motivated and guided detailed modeling of the thermal 
structure to explain the measured temperature profile 
(jMcKav et al.lll989i) . Similarly, the observational detec- 
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Fig. 13. — Atmospheric retrieval for stratified atmospheres. The left panel illustrates the volume mixing ratio profiles used to simulate 
the synthetic JWST observations for a stratified atmosphere scenario. In this scenario, the volume mixing ratio Xc02 i^ chosen to decrease 
log-linearly from 5% at 1 bar down to 0.1% at 0.01 mbar. The error bars of the observations are similar to ones in Figure [9l (synthetic 
observations are not shown). The middel panel illustrates the marginalized posterior probability density as obtained when performing 
atmospheric retrieval on the synthetic observations. The abundances of all molecular species are robustly retrieved. The most likely value 
for Xqq^ matches the value at the 1-10 mbar level because the functional derivatives — averaged across the observed spectrum — are highest 
for this pressure level (right panel). 



tion and abundance constraints on the methane plume in 
the Martian atmosphere motivated a multitude of stud- 
ies on potential sources and sink s (e.g. jLefevre fc Forgetl 
[200l lKras"nopolskv et al|[200l . 

We envision the same kind of complementarity for the 
characterization of exoplanets. The strategy would be 
to use the retrieval method as presented in this work 
to identify quantitative constraints on the atmospheric 
composition provided by the observations. Then, the 
constraints on elemental or molecular abundances can 
serve as inputs to help guide the detailed atmospheric 
modeling to explore physical processes that would ex- 
plain the findings. Conversely, atmospheric retrieval can 
be complemented by self-consistent forward models in 
that self-consistent modeling can further constrain the 
parameter space by checking the physical plausibility of 
the atmospheric scenarios. 

Chemistry-Transport and Photochemistry Models — While 
self-consistent forward models of atmospheric chemistry 
aim to provide the physical understanding of the rele- 
vant processes in the atmosphere, self-consistent forward 
models are dependent on inputs such as the background 
atmosphere or elemental abundances, the boundary con- 
ditions at the surfaces, as well as an accurate represen- 
tation of all relevant chemical reactions, heat transport 
and cloud formation processes. If we knew all inputs 
and relevant processes a priori, one could compute the 
chemical composition and state of the atmosphere with 
a self-consistent model. However, many of these inputs 
will not be known for exoplanets, especially for planets 
that do not agree with our preconceived ideas. 

Atmospheric retrieval provides an alternative to self- 
consistent modeling forobtaining the composition and 
state of the atmosphere, but is based on observations 
rather than detailed modeling. It can, therefore, guide 
the development and application of self-consistent models 
in providing constraints on the background atmosphere 
as well as constraints on the minor species in atmo- 
sphere. As the background atmospheres of super-Earth 
planets are not known a priori, it appears that most self- 



consistent models of atmospheric chemistry will require 
that one uses atmospheric retrieval to infer at least the 
main species in the atmospheres of these objects. If the 
self-consistently modeled atmospheric properties deviate 
from the ones provided by the retrieval analysis of given a 
set of observations, then the analysis of the deviation may 
motivate the inclusion of additional physical or chemical 
processes to the self-consistent model (e.g., additional 
sources and sinks for molecular species). The combina- 
tion of self-consistent modeling and atmospheric retrieval 
to interpret observations can, therefore, enhance our un- 
derstanding of the physical processes in the atmospheres 
of extrasolar planets. 

Atmospheric Dynamics — One of the most critical factors 
affecting atmospheric circulation models is determining 
the pressur e level at which the bulk of the stellar energy is 
deposited faengl 120121: iPerna et al.ll2012[ ). Atmospheric 
retrieval may provide a useful input to determine this 
pressure level for an observed exoplanet because it allows 
the constraint of the molecular composition of the atmo- 
sphere, which strongly affects the opacity of the atmo- 
sphere to the incident stellar flux. For h ot Jupiters, pre- 
vious studies (e.g.. [showma n et al.ll2G09l ) assumed chem- 
ical equilibrium in combination with solar composition as 
a fiducial estimate of the composition. When modeling a 
specific planet, the danger is that these assumptions for 
the composition introduce inaccuracies in the deposition 
of stellar light and therefore alter the results. 

For circulation modeling of super-Earth atmospheres, 
obtaining observational constraints on the atmospheric 
properties is critical. Without observations or a bet- 
ter understanding of super-Earth planets, even the main 
constituents of these atmospheres are unknown, and 
therefore no fiducial assumptions on the composition and 
stellar flux deposition can be made. For rocky planets, 
the presence of a solid surface and the pressure level at 
the surface play a major role in the atmospheric circu- 
lation. The retrieved surface pressure from observations 
may, therefore, also provide an essential input for circu- 
lation models. 
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6. SUMMARY AND CONCLUSIONS 

We have presented a Bayesian method to retrieve the 
atmospheric composition and thickness of a super-Earth 
exoplanet from observations of its transmission spec- 
trum. Our approach is different from previous work on 
super-Earths in that we do not test preconceived scenar- 
ios, but retrieve constraints on the atmospheric proper- 
ties governed by observations, assuming no prior knowl- 
edge of the nature of the planet. Our work extends pre- 
vious work on atmospheric retrieval for hot Jupiters in 
that we introduce a parameterization that is applicable 
to general atmospheres in which hydrogen may not be the 
dominating gas and clouds may be present. We infer con- 
straints on individual parameters directly by marginaliz- 
ing the joint posterior probability distribution of the at- 
mospheric parameters. The uncertainty of individual pa- 
rameters introduced by complicated, non-Gaussian cor- 
relations with other parameters is, therefore, accounted 
for in an elegant and straightforward way. 

In this work, we have applied the retrieval method 
to synthetic observations of the super-Earth GJ 1214b. 
We investigated which constraints on the atmospheres of 
super-Earth exoplanets can be inferred from future ob- 
servations of their transmission spectra. Our most sig- 
nificant findings are summarized as follows. 

• A unique constraint of the mixing ratios of the 
absorbing gases and up to two spectrally inactive 
gases is possible with moderate-resolution (R ~ 
100) transmission spectra, if the spectral cover- 
age and S/N of the observations are sufficient to 
quantify (1) the transit depths in, at least, one ab- 
sorption feature for each absorbing gas at visible 
or near-infrared wavelengths and (2) the slope and 
strength of the molecular Rayleigh scattering sig- 
nature at short wavelengths. Assuming that the 
atmosphere is wellmixed, and that N2 and a pri- 
mordial mix of H2 -l- He are the only significant 
spectrally inactive components, one can therefore 
uniquely constrain the composition of the atmo- 
sphere based on transit observations alone. 

• We can discriminate between a thick, cloud-free at- 
mosphere and an atmosphere with a surface, where 
the surface is either the ground or an opaque cloud 
deck. For an atmosphere with a surface at low 
optical depth, we can quantitatively constrain the 
pressure at this surface. A unique constraint of the 
composition is also possible for an atmosphere with 
a surface. 

• An estimate of the mean molecular mass made in- 
dependently of the other unknown atmospheric pa- 
rameters is possible by measuring either the slope 
of the Rayleigh scattering signature, the shape of 
individual absorption features, or the relative tran- 
sit depths in different features of the same molecu- 
lar absorber. For super Earths, discriminating be- 
tween hydrogen-rich atmospheres and high mean 
molecular mass atmospheres is, therefore, possible, 
even in the presence of clouds. 

• Determining the volume mixing ratios of the ab- 
sorbing gases relies on observations of the molecular 



Rayleigh scattering signature. Although the pres- 
ence of most molecular species can be identified in 
the near-infrared, only the relative abundances of 
the absorbing molecules can be determined from 
the infrared spectrum, not their volume mixing ra- 
tios in the atmosphere. The Rayleigh signature of 
molecular scattering is required because it enables 
the measurement of the abundances of spectrally 
inactive gases. If the molecular Rayleigh scatter- 
ing cannot be observed or is masked by haze scat- 
tering at short wavelengths, we will not be able 
to determine the volume mixing ratio of the gases 
in the atmosphere to within orders of magnitude. 
The drastic inability to constrain the mixing ra- 
tio was not discovered in previous work on atmo- 
spheric retrieval because hot Jupiters were assumed 
to be cloud-free and the mean molecular mass in a 
hydroge n-dominated atmosphere was k nown a pri- 
ori fe.g.. lMadhusudhan fc SeageHl2009t) . 

• The retrieval of the mixing ratios of spectrally inac- 
tive gases is fundamentally limited to two indepen- 
dent components. An inherent degeneracy arises 
if the atmosphere contains three or more indepen- 
dent spectrally inactive gases because the same 
mean molecular mass and the same strength of the 
Rayleigh scattering signature can be obtained with 
different combinations of the gases. 

• Non-Gaussian treatments of the uncertainties of at- 
mospheric parameters are essential for atmospheric 
retrieval from noisy exoplanet observations. Even 
given low-noise synthetic observations as consid- 
ered in this work, only one-sided bounds and highly 
non-Gaussian correlations exist for some atmo- 
spheric parameters. Non-Gaussian effects will be- 
come stronger for observational data sets noisier 
than the synthetic data considered in this work be- 
cause the relation between the observables and the 
desired atmospheric parameters is highly nonlinear 
and larger volumes of the parameter space become 
compatible with noisier observatio ns. A limitation 
of optimum est imation retrieval ()Lee et al.l 120111 : 
iLine et aLll2012D for the analysis of noisy exoplanet 
spectra is, therefore, that the extent of the confi- 
dence regions of atmospheric properties cannot cor- 
rectly be described by Gaussian errors around a 
single best-fitting solution. 

Our findings indicate that the retrieval method pre- 
sented here, combined with low-noise observations, will 
provide the opportunity to observationally character- 
ize atmospheres of individual super-Earth planets and 
uniquely identify their molecular and elemental compo- 
sitions. Similar to observational constraints on the at- 
mospheres of the Solar System planets obtained over the 
last decades, the quantitative constraints obtainable with 
our atmospheric retrieval will generally be independent of 
preconceived ideas of atmospheric physics and chemistry 
as well as planet formation scenarios and atmospheric 
evolution. The unbiased constraints can, therefore, mo- 
tivate the detailed study of the new phenomena in at- 
mospheric dynamics and chemistry, identify habitability 
and biosignatures, or provide clues to planet formation 
and atmospheric evolution. 
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APPENDIX 
AN ALGEBRAIC SOLUTION TO INFER THE MEAN MOLECULAR MASS 

For thin or cloudy a tmospheres the change in the transit depth across the spectrum, AD, as proposed by 
iMiller-Ricci et all (j2009( ). cannot be used to uniquely constrain the mean molecular mass because clouds, hazes, 
and a surface also affect the feature depths. Here, we show that measuring the linear slope of the Rayleigh scattering 
signature or the shapes of individual features, instead, does provide constraints on the atmosphere scale height and can 
be used to estimate the mean molecular mass for general atmospheres independently of other atmospheric properties. 
We derive an algebraic solutions that can be used to infer the mean molecular mass directly from the transmission 
spectrum. 

From the geometry described bv iBrownl (J2001I ). we obtain the slant optical depth, r(6), as a function of the impact 
parameter, 6, by integrating the opacity through the planet's atmosphere along the observer's line of sight: 

oo 

T.(6) = 2|a,(r)n(r)^^£=. (Al) 

b 

Here, r is the radial distance from the center of the planet. For Rayleigh scattering, the extinction cross section is 
only very weakly dependent on pressure and temperature, and we can assume <7\ (r) = a\. Furthermore, motivated by 
hydrostatic equilibrium, we assume that the atmospheric number density falls off exponentially according to n (r) — 
noe~TT , where H is the atmospheric scale height. With these assumptions we can analytically perform the integration 
in Equation (|Aip and obtain 

TA (6) = 2nocr6/Ci (j^j- 2?7,ocr6 /-^g-^, (A2) 

where the modified Besscl function of the second kind /Ci (x) is approximated by its asymptotic form /Ci (x) ~ 
■v/^e^^ [l + O (i)] for large x (jBronstein et al.lll999t) . For spectral regions, for which the atmosphere is optically 
thick, the surface does not affect the transmission spectrum and the observed planet radius as a function of wavelength 
can be approximated as 

Rp,x « 6 (ta = 1) , (A3) 

because the number density falls exponentially with altitude leading to steep increase in tx as a function of b. 
Forming the ratio between the radii at two different wavelengths, Ai and A2, for which the extinction cross sections 
are ci and a2, and solving for the scale height, we obtain 

rr\ Rp.2 — Rp,i dRp^x dRp_\ 
tl\r=Rp ~ 7 r- ^ > — 7 r- ^ ~ 37j -^, iA4j 



In 



(^ JrEI] dln{ax^R^) d{\nax)' 

\cny RpA J 



where we considered the limit of Ai — > A2 and then approximated for -j-, — ^iA^ i 

Given an estimate of the atmospheric temperature, e.g., T w Tcq, we can observationally determine an estimate on 
the mean molecular mass 

where the factor (l ± ^) accounts for the inherent uncertainty due to the uncertainty, ST, in modeling the atmo- 
spheric temperature, T, at the planetary radius r = Rp^x- 

At short wavelengths for which Rayleigh scattering dominates, the extinction cross section a is proportional to A~^, 
and we obtain 

H^^EM_RPM (A6) 

Given two transit depth observations at Aiand A2 in the Rayleigh scattering regime, we obtain the estimate for the 
mean molecular mass 
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